mindspore.dataset.audio.DetectPitchFrequency

class mindspore.dataset.audio.DetectPitchFrequency(sample_rate, frame_time=0.01, win_length=30, freq_low=85, freq_high=3400)[source]

Detect pitch frequency.

It is implemented using normalized cross-correlation function and median smoothing.

Parameters
  • sample_rate (int) – Sampling rate of the waveform, e.g. 44100 (Hz), the value can’t be zero.

  • frame_time (float, optional) – Duration of a frame, the value must be greater than zero. Default: 0.01.

  • win_length (int, optional) – The window length for median smoothing (in number of frames), the value must be greater than zero. Default: 30.

  • freq_low (int, optional) – Lowest frequency that can be detected (Hz), the value must be greater than zero. Default: 85.

  • freq_high (int, optional) – Highest frequency that can be detected (Hz), the value must be greater than zero. Default: 3400.

Raises
Supported Platforms:

CPU

Examples

>>> import numpy as np
>>> import mindspore.dataset as ds
>>> import mindspore.dataset.audio as audio
>>>
>>> # Use the transform in dataset pipeline mode
>>> waveform = np.random.random([5, 16])  # 5 samples
>>> numpy_slices_dataset = ds.NumpySlicesDataset(data=waveform, column_names=["audio"])
>>> transforms = [audio.DetectPitchFrequency(30, 0.1, 3, 5, 25)]
>>> numpy_slices_dataset = numpy_slices_dataset.map(operations=transforms, input_columns=["audio"])
>>> for item in numpy_slices_dataset.create_dict_iterator(num_epochs=1, output_numpy=True):
...     print(item["audio"].shape, item["audio"].dtype)
...     break
(5,) float32
>>>
>>> # Use the transform in eager mode
>>> waveform = np.random.random([16])  # 1 sample
>>> output = audio.DetectPitchFrequency(30, 0.1, 3, 5, 25)(waveform)
>>> print(output.shape, output.dtype)
(5,) float32
Tutorial Examples: