mindspore.dataset.audio
This module is to support audio augmentations. It includes two parts: audio transforms and utils. audio transforms is a high performance processing module with common audio operations. utils provides some general methods for audio processing.
Common imported modules in corresponding API examples are as follows:
import mindspore.dataset as ds
import mindspore.dataset.audio as audio
from mindspore.dataset.audio import utils
Alternative and equivalent imported audio module is as follows:
import mindspore.dataset.audio.transforms as audio
Descriptions of common data processing terms are as follows:
TensorOperation, the base class of all data processing operations implemented in C++.
AudioTensorOperation, the base class of all audio processing operations. It is a derived class of TensorOperation.
Transforms
Design two-pole all-pass filter with central frequency and bandwidth for audio waveform. |
|
Turn the input audio waveform from the amplitude/power scale to decibel scale. |
|
Calculate the angle of complex number sequence. |
|
Design two-pole band-pass filter for audio waveform. |
|
Design two-pole Butterworth band-pass filter for audio waveform. |
|
Design two-pole Butterworth band-reject filter for audio waveform. |
|
Design a bass tone-control effect, also known as two-pole low-shelf filter for audio waveform. |
|
Perform a biquad filter of input audio. |
|
Compute the norm of complex number sequence. |
|
Compute delta coefficients of a spectrogram. |
|
Apply contrast effect for audio waveform. |
|
Turn a waveform from the decibel scale to the power/amplitude scale. |
|
Apply a DC shift to the audio. |
|
Design two-pole deemph filter for audio waveform of dimension of (..., time). |
|
Detect pitch frequency. |
|
Dither increases the perceived dynamic range of audio stored at a particular bit-depth by eliminating nonlinear truncation distortion. |
|
Design biquad equalizer filter and perform filtering. |
|
Add a fade in and/or fade out to an waveform. |
|
Apply a flanger effect to the audio. |
|
Apply masking to a spectrogram in the frequency domain. |
|
Apply amplification or attenuation to the whole waveform. |
|
Approximate magnitude spectrogram inversion using the GriffinLim algorithm. |
|
Design biquad highpass filter and perform filtering. |
|
Solve for a normal STFT form a mel frequency STFT, using a conversion matrix. |
|
Design two-pole filter for audio waveform of dimension of (..., time). |
|
Design two-pole low-pass filter for audio waveform. |
|
Separate a complex-valued spectrogram with shape (..., 2) into its magnitude and phase. |
|
Apply a mask along axis. |
|
Apply a mask along axis. |
|
Convert normal STFT to STFT at the Mel scale. |
|
Decode mu-law encoded signal. |
|
Encode signal based on mu-law companding. |
|
Apply overdrive on input audio. |
|
Apply a phasing effect to the audio. |
|
Given a STFT tensor, speed up in time without modifying pitch by a factor of rate. |
|
Resample a signal from one frequency to another. |
|
Apply RIAA vinyl playback equalization. |
|
Apply sliding-window cepstral mean (and optionally variance) normalization per utterance. |
|
Create a spectral centroid from an audio signal. |
|
Create a spectrogram from an audio signal. |
|
Apply masking to a spectrogram in the time domain. |
|
Stretch Short Time Fourier Transform (STFT) in time without modifying pitch for a given rate. |
|
Design a treble tone-control effect. |
|
Attempt to trim silent background sounds from the end of the voice recording. |
|
Apply amplification or attenuation to the whole waveform. |
Utilities
Padding Mode, BorderType Type. |
|
Density Functions. |
|
Fade Shapes. |
|
Gain Types. |
|
Interpolation Type. |
|
Mel Types. |
|
Modulation Type. |
|
Norm Types. |
|
Norm Types. |
|
Resample method |
|
Scale Types. |
|
Window Function types, |
|
Create a DCT transformation matrix with shape (n_mels, n_mfcc), normalized depending on norm. |
|
Create a frequency transformation matrix with shape (n_freqs, n_mels). |