mindspore.dataset.audio

This module is to support audio augmentations. It includes two parts: audio transforms and utils. audio transforms is a high performance processing module with common audio operations. utils provides some general methods for audio processing.

Common imported modules in corresponding API examples are as follows:

import mindspore.dataset as ds
import mindspore.dataset.audio as audio
from mindspore.dataset.audio import utils

Alternative and equivalent imported audio module is as follows:

import mindspore.dataset.audio.transforms as audio

Descriptions of common data processing terms are as follows:

TensorOperation, the base class of all data processing operations implemented in C++.
AudioTensorOperation, the base class of all audio processing operations. It is a derived class of TensorOperation.

The data transform operator can be executed in the data processing pipeline or in the eager mode:

Pipeline mode is generally used to process datasets. For examples, please refer to introduction to data processing pipeline.

Eager mode is generally used for scattered samples. Examples of audio preprocessing are as follows:

import numpy as np
import mindspore.dataset.audio as audio
from mindspore.dataset.audio import ResampleMethod

# audio sample
waveform = np.random.random([1, 30])

# transform
resample_op = audio.Resample(orig_freq=48000, new_freq=16000,
                             resample_method=ResampleMethod.SINC_INTERPOLATION,
                             lowpass_filter_width=6, rolloff=0.99, beta=None)
waveform_resampled = resample_op(waveform)
print("waveform reampled: {}".format(waveform_resampled), flush=True)

Transforms

`mindspore.dataset.audio.AllpassBiquad`	Design two-pole all-pass filter with central frequency and bandwidth for audio waveform.
`mindspore.dataset.audio.AmplitudeToDB`	Turn the input audio waveform from the amplitude/power scale to decibel scale.
`mindspore.dataset.audio.Angle`	Calculate the angle of complex number sequence.
`mindspore.dataset.audio.BandBiquad`	Design two-pole band-pass filter for audio waveform.
`mindspore.dataset.audio.BandpassBiquad`	Design two-pole Butterworth band-pass filter for audio waveform.
`mindspore.dataset.audio.BandrejectBiquad`	Design two-pole Butterworth band-reject filter for audio waveform.
`mindspore.dataset.audio.BassBiquad`	Design a bass tone-control effect, also known as two-pole low-shelf filter for audio waveform.
`mindspore.dataset.audio.Biquad`	Perform a biquad filter of input audio.
`mindspore.dataset.audio.ComplexNorm`	Compute the norm of complex number sequence.
`mindspore.dataset.audio.ComputeDeltas`	Compute delta coefficients of a spectrogram.
`mindspore.dataset.audio.Contrast`	Apply contrast effect for audio waveform.
`mindspore.dataset.audio.DBToAmplitude`	Turn a waveform from the decibel scale to the power/amplitude scale.
`mindspore.dataset.audio.DCShift`	Apply a DC shift to the audio.
`mindspore.dataset.audio.DeemphBiquad`	Design two-pole deemph filter for audio waveform of dimension of (..., time).
`mindspore.dataset.audio.DetectPitchFrequency`	Detect pitch frequency.
`mindspore.dataset.audio.Dither`	Dither increases the perceived dynamic range of audio stored at a particular bit-depth by eliminating nonlinear truncation distortion.
`mindspore.dataset.audio.EqualizerBiquad`	Design biquad equalizer filter and perform filtering.
`mindspore.dataset.audio.Fade`	Add a fade in and/or fade out to an waveform.
`mindspore.dataset.audio.Flanger`	Apply a flanger effect to the audio.
`mindspore.dataset.audio.FrequencyMasking`	Apply masking to a spectrogram in the frequency domain.
`mindspore.dataset.audio.Gain`	Apply amplification or attenuation to the whole waveform.
`mindspore.dataset.audio.GriffinLim`	Approximate magnitude spectrogram inversion using the GriffinLim algorithm.
`mindspore.dataset.audio.HighpassBiquad`	Design biquad highpass filter and perform filtering.
`mindspore.dataset.audio.InverseMelScale`	Solve for a normal STFT form a mel frequency STFT, using a conversion matrix.
`mindspore.dataset.audio.LFilter`	Design two-pole filter for audio waveform of dimension of (..., time).
`mindspore.dataset.audio.LowpassBiquad`	Design two-pole low-pass filter for audio waveform.
`mindspore.dataset.audio.Magphase`	Separate a complex-valued spectrogram with shape (..., 2) into its magnitude and phase.
`mindspore.dataset.audio.MaskAlongAxis`	Apply a mask along axis.
`mindspore.dataset.audio.MaskAlongAxisIID`	Apply a mask along axis.
`mindspore.dataset.audio.MelScale`	Convert normal STFT to STFT at the Mel scale.
`mindspore.dataset.audio.MuLawDecoding`	Decode mu-law encoded signal.
`mindspore.dataset.audio.MuLawEncoding`	Encode signal based on mu-law companding.
`mindspore.dataset.audio.Overdrive`	Apply overdrive on input audio.
`mindspore.dataset.audio.Phaser`	Apply a phasing effect to the audio.
`mindspore.dataset.audio.PhaseVocoder`	Given a STFT tensor, speed up in time without modifying pitch by a factor of rate.
`mindspore.dataset.audio.Resample`	Resample a signal from one frequency to another.
`mindspore.dataset.audio.RiaaBiquad`	Apply RIAA vinyl playback equalization.
`mindspore.dataset.audio.SlidingWindowCmn`	Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.
`mindspore.dataset.audio.SpectralCentroid`	Create a spectral centroid from an audio signal.
`mindspore.dataset.audio.Spectrogram`	Create a spectrogram from an audio signal.
`mindspore.dataset.audio.TimeMasking`	Apply masking to a spectrogram in the time domain.
`mindspore.dataset.audio.TimeStretch`	Stretch Short Time Fourier Transform (STFT) in time without modifying pitch for a given rate.
`mindspore.dataset.audio.TrebleBiquad`	Design a treble tone-control effect.
`mindspore.dataset.audio.Vad`	Attempt to trim silent background sounds from the end of the voice recording.
`mindspore.dataset.audio.Vol`	Apply amplification or attenuation to the whole waveform.

Utilities

`mindspore.dataset.audio.BorderType`	Padding Mode, BorderType Type.
`mindspore.dataset.audio.DensityFunction`	Density Functions.
`mindspore.dataset.audio.FadeShape`	Fade Shapes.
`mindspore.dataset.audio.GainType`	Gain Types.
`mindspore.dataset.audio.Interpolation`	Interpolation Type.
`mindspore.dataset.audio.MelType`	Mel Types.
`mindspore.dataset.audio.Modulation`	Modulation Type.
`mindspore.dataset.audio.NormMode`	Norm Types.
`mindspore.dataset.audio.NormType`	Norm Types.
`mindspore.dataset.audio.ResampleMethod`	Resample method
`mindspore.dataset.audio.ScaleType`	Scale Types.
`mindspore.dataset.audio.WindowType`	Window Function types,
`mindspore.dataset.audio.create_dct`	Create a DCT transformation matrix with shape (n_mels, n_mfcc), normalized depending on norm.
`mindspore.dataset.audio.melscale_fbanks`	Create a frequency transformation matrix with shape (n_freqs, n_mels).