mindspore.dataset.audio.SlidingWindowCmn

class mindspore.dataset.audio.SlidingWindowCmn(cmn_window=600, min_cmn_window=100, center=False, norm_vars=False)[source]

Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.

Parameters

cmn_window (int, optional) – Window in frames for running average CMN computation. Default: 600.
min_cmn_window (int, optional) – Minimum CMN window used at start of decoding (adds latency only at start). Only applicable if center is False, ignored if center is True. Default: 100.
center (bool, optional) – If True, use a window centered on the current frame. If False, window is to the left. Default: False.
norm_vars (bool, optional) – If True, normalize variance to one. Default: False.

Raises

TypeError – If cmn_window is not of type int.
ValueError – If cmn_window is a negative number.
TypeError – If min_cmn_window is not of type int.
ValueError – If min_cmn_window is a negative number.
TypeError – If center is not of type bool.
TypeError – If norm_vars is not of type bool.

Supported Platforms:: CPU

Examples

>>> import numpy as np
>>> import mindspore.dataset as ds
>>> import mindspore.dataset.audio as audio
>>>
>>> # Use the transform in dataset pipeline mode
>>> waveform = np.random.random([5, 16, 3])  # 5 samples
>>> numpy_slices_dataset = ds.NumpySlicesDataset(data=waveform, column_names=["audio"])
>>> transforms = [audio.SlidingWindowCmn()]
>>> numpy_slices_dataset = numpy_slices_dataset.map(operations=transforms, input_columns=["audio"])
>>> for item in numpy_slices_dataset.create_dict_iterator(num_epochs=1, output_numpy=True):
...     print(item["audio"].shape, item["audio"].dtype)
...     break
(16, 3) float64
>>>
>>> # Use the transform in eager mode
>>> waveform = np.random.random([16, 3])  # 1 sample
>>> output = audio.SlidingWindowCmn()(waveform)
>>> print(output.shape, output.dtype)
(16, 3) float64

Tutorial Examples:

Illustration of audio transforms