mindspore.dataset.audio

This module is to support audio augmentations. It includes two parts: audio transforms and utils. audio transforms is a high performance processing module with common audio operations. utils provides some general methods for audio processing.

Common imported modules in corresponding API examples are as follows:

import mindspore.dataset as ds
import mindspore.dataset.audio as audio
from mindspore.dataset.audio import utils

Alternative and equivalent imported audio module is as follows:

import mindspore.dataset.audio.transforms as audio

Descriptions of common data processing terms are as follows:

  • TensorOperation, the base class of all data processing operations implemented in C++.

  • AudioTensorOperation, the base class of all audio processing operations. It is a derived class of TensorOperation.

The data transform operator can be executed in the data processing pipeline or in the eager mode:

  • Pipeline mode is generally used to process datasets. For examples, please refer to introduction to data processing pipeline.

  • Eager mode is generally used for scattered samples. Examples of audio preprocessing are as follows:

    import numpy as np
    import mindspore.dataset.audio as audio
    from mindspore.dataset.audio import ResampleMethod
    
    # audio sample
    waveform = np.random.random([1, 30])
    
    # transform
    resample_op = audio.Resample(orig_freq=48000, new_freq=16000,
                                 resample_method=ResampleMethod.SINC_INTERPOLATION,
                                 lowpass_filter_width=6, rolloff=0.99, beta=None)
    waveform_resampled = resample_op(waveform)
    print("waveform reampled: {}".format(waveform_resampled), flush=True)
    

Transforms

mindspore.dataset.audio.AllpassBiquad

Design two-pole all-pass filter with central frequency and bandwidth for audio waveform.

mindspore.dataset.audio.AmplitudeToDB

Turn the input audio waveform from the amplitude/power scale to decibel scale.

mindspore.dataset.audio.Angle

Calculate the angle of complex number sequence.

mindspore.dataset.audio.BandBiquad

Design two-pole band-pass filter for audio waveform.

mindspore.dataset.audio.BandpassBiquad

Design two-pole Butterworth band-pass filter for audio waveform.

mindspore.dataset.audio.BandrejectBiquad

Design two-pole Butterworth band-reject filter for audio waveform.

mindspore.dataset.audio.BassBiquad

Design a bass tone-control effect, also known as two-pole low-shelf filter for audio waveform.

mindspore.dataset.audio.Biquad

Perform a biquad filter of input audio.

mindspore.dataset.audio.ComplexNorm

Compute the norm of complex number sequence.

mindspore.dataset.audio.ComputeDeltas

Compute delta coefficients of a spectrogram.

mindspore.dataset.audio.Contrast

Apply contrast effect for audio waveform.

mindspore.dataset.audio.DBToAmplitude

Turn a waveform from the decibel scale to the power/amplitude scale.

mindspore.dataset.audio.DCShift

Apply a DC shift to the audio.

mindspore.dataset.audio.DeemphBiquad

Design two-pole deemph filter for audio waveform of dimension of (..., time).

mindspore.dataset.audio.DetectPitchFrequency

Detect pitch frequency.

mindspore.dataset.audio.Dither

Dither increases the perceived dynamic range of audio stored at a particular bit-depth by eliminating nonlinear truncation distortion.

mindspore.dataset.audio.EqualizerBiquad

Design biquad equalizer filter and perform filtering.

mindspore.dataset.audio.Fade

Add a fade in and/or fade out to an waveform.

mindspore.dataset.audio.Flanger

Apply a flanger effect to the audio.

mindspore.dataset.audio.FrequencyMasking

Apply masking to a spectrogram in the frequency domain.

mindspore.dataset.audio.Gain

Apply amplification or attenuation to the whole waveform.

mindspore.dataset.audio.GriffinLim

Approximate magnitude spectrogram inversion using the GriffinLim algorithm.

mindspore.dataset.audio.HighpassBiquad

Design biquad highpass filter and perform filtering.

mindspore.dataset.audio.InverseMelScale

Solve for a normal STFT form a mel frequency STFT, using a conversion matrix.

mindspore.dataset.audio.LFilter

Design two-pole filter for audio waveform of dimension of (..., time).

mindspore.dataset.audio.LowpassBiquad

Design two-pole low-pass filter for audio waveform.

mindspore.dataset.audio.Magphase

Separate a complex-valued spectrogram with shape (..., 2) into its magnitude and phase.

mindspore.dataset.audio.MaskAlongAxis

Apply a mask along axis.

mindspore.dataset.audio.MaskAlongAxisIID

Apply a mask along axis.

mindspore.dataset.audio.MelScale

Convert normal STFT to STFT at the Mel scale.

mindspore.dataset.audio.MuLawDecoding

Decode mu-law encoded signal.

mindspore.dataset.audio.MuLawEncoding

Encode signal based on mu-law companding.

mindspore.dataset.audio.Overdrive

Apply overdrive on input audio.

mindspore.dataset.audio.Phaser

Apply a phasing effect to the audio.

mindspore.dataset.audio.PhaseVocoder

Given a STFT tensor, speed up in time without modifying pitch by a factor of rate.

mindspore.dataset.audio.Resample

Resample a signal from one frequency to another.

mindspore.dataset.audio.RiaaBiquad

Apply RIAA vinyl playback equalization.

mindspore.dataset.audio.SlidingWindowCmn

Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.

mindspore.dataset.audio.SpectralCentroid

Create a spectral centroid from an audio signal.

mindspore.dataset.audio.Spectrogram

Create a spectrogram from an audio signal.

mindspore.dataset.audio.TimeMasking

Apply masking to a spectrogram in the time domain.

mindspore.dataset.audio.TimeStretch

Stretch Short Time Fourier Transform (STFT) in time without modifying pitch for a given rate.

mindspore.dataset.audio.TrebleBiquad

Design a treble tone-control effect.

mindspore.dataset.audio.Vad

Attempt to trim silent background sounds from the end of the voice recording.

mindspore.dataset.audio.Vol

Apply amplification or attenuation to the whole waveform.

Utilities

mindspore.dataset.audio.BorderType

Padding Mode, BorderType Type.

mindspore.dataset.audio.DensityFunction

Density Functions.

mindspore.dataset.audio.FadeShape

Fade Shapes.

mindspore.dataset.audio.GainType

Gain Types.

mindspore.dataset.audio.Interpolation

Interpolation Type.

mindspore.dataset.audio.MelType

Mel Types.

mindspore.dataset.audio.Modulation

Modulation Type.

mindspore.dataset.audio.NormMode

Norm Types.

mindspore.dataset.audio.NormType

Norm Types.

mindspore.dataset.audio.ResampleMethod

Resample method

mindspore.dataset.audio.ScaleType

Scale Types.

mindspore.dataset.audio.WindowType

Window Function types,

mindspore.dataset.audio.create_dct

Create a DCT transformation matrix with shape (n_mels, n_mfcc), normalized depending on norm.

mindspore.dataset.audio.melscale_fbanks

Create a frequency transformation matrix with shape (n_freqs, n_mels).