mindspore.dataset.audio

此模块用于音频数据增强，包括 transforms 和 utils 两个子模块。 transforms 是一个高性能音频数据增强模块，支持常见的音频数据增强操作。 utils 提供了一些音频处理的工具方法。

API样例中常用的导入模块如下：

import mindspore.dataset as ds
import mindspore.dataset.audio as audio

常用数据处理术语说明如下：

TensorOperation，所有C++实现的数据处理操作的基类。
AudioTensorOperation，所有音频数据处理操作的基类，派生自TensorOperation。

数据增强算子可以放入数据处理Pipeline中执行，也可以Eager模式执行：

Pipeline模式一般用于处理数据集，示例可参考数据处理Pipeline介绍。

Eager模式一般用于零散样本，音频预处理举例如下：

import numpy as np
import mindspore.dataset.audio as audio
from mindspore.dataset.audio import ResampleMethod

# 音频输入
waveform = np.random.random([1, 30])

# 增强操作
resample_op = audio.Resample(orig_freq=48000, new_freq=16000,
                             resample_method=ResampleMethod.SINC_INTERPOLATION,
                             lowpass_filter_width=6, rolloff=0.99, beta=None)
waveform_resampled = resample_op(waveform)
print("waveform reampled: {}".format(waveform_resampled), flush=True)

变换

`mindspore.dataset.audio.AllpassBiquad`	给音频波形施加双极点全通滤波器，其中心频率和带宽由入参指定。
`mindspore.dataset.audio.AmplitudeToDB`	将输入音频从振幅/功率标度转换为分贝标度。
`mindspore.dataset.audio.Angle`	计算复数序列的角度。
`mindspore.dataset.audio.BandBiquad`	给音频波形施加双极点带通滤波器。
`mindspore.dataset.audio.BandpassBiquad`	给音频波形施加双极点巴特沃斯（Butterworth）带通滤波器。
`mindspore.dataset.audio.BandrejectBiquad`	给音频波形施加双极点巴特沃斯（Butterworth）带阻滤波器。
`mindspore.dataset.audio.BassBiquad`	给音频波形施加低音控制效果，即双极点低频搁架滤波器。
`mindspore.dataset.audio.Biquad`	Perform a biquad filter of input audio.
`mindspore.dataset.audio.ComplexNorm`	计算复数序列的范数。
`mindspore.dataset.audio.ComputeDeltas`	Compute delta coefficients of a spectrogram.
`mindspore.dataset.audio.Contrast`	给音频波形施加对比度增强效果。
`mindspore.dataset.audio.DBToAmplitude`	Turn a waveform from the decibel scale to the power/amplitude scale.
`mindspore.dataset.audio.DCShift`	Apply a DC shift to the audio.
`mindspore.dataset.audio.DeemphBiquad`	Design two-pole deemph filter for audio waveform of dimension of (..., time).
`mindspore.dataset.audio.DetectPitchFrequency`	Detect pitch frequency.
`mindspore.dataset.audio.Dither`	Dither increases the perceived dynamic range of audio stored at a particular bit-depth by eliminating nonlinear truncation distortion.
`mindspore.dataset.audio.EqualizerBiquad`	Design biquad equalizer filter and perform filtering.
`mindspore.dataset.audio.Fade`	Add a fade in and/or fade out to an waveform.
`mindspore.dataset.audio.Flanger`	Apply a flanger effect to the audio.
`mindspore.dataset.audio.FrequencyMasking`	给音频波形施加频域掩码。
`mindspore.dataset.audio.Gain`	Apply amplification or attenuation to the whole waveform.
`mindspore.dataset.audio.GriffinLim`	Approximate magnitude spectrogram inversion using the GriffinLim algorithm.
`mindspore.dataset.audio.HighpassBiquad`	Design biquad highpass filter and perform filtering.
`mindspore.dataset.audio.InverseMelScale`	Solve for a normal STFT form a mel frequency STFT, using a conversion matrix.
`mindspore.dataset.audio.LFilter`	Design two-pole filter for audio waveform of dimension of (..., time).
`mindspore.dataset.audio.LowpassBiquad`	给音频波形施加双极点低通滤波器。
`mindspore.dataset.audio.Magphase`	Separate a complex-valued spectrogram with shape (..., 2) into its magnitude and phase.
`mindspore.dataset.audio.MaskAlongAxis`	Apply a mask along axis.
`mindspore.dataset.audio.MaskAlongAxisIID`	Apply a mask along axis.
`mindspore.dataset.audio.MelScale`	Convert normal STFT to STFT at the Mel scale.
`mindspore.dataset.audio.MuLawDecoding`	Decode mu-law encoded signal.
`mindspore.dataset.audio.MuLawEncoding`	Encode signal based on mu-law companding.
`mindspore.dataset.audio.Overdrive`	Apply overdrive on input audio.
`mindspore.dataset.audio.Phaser`	Apply a phasing effect to the audio.
`mindspore.dataset.audio.PhaseVocoder`	Given a STFT tensor, speed up in time without modifying pitch by a factor of rate.
`mindspore.dataset.audio.Resample`	Resample a signal from one frequency to another.
`mindspore.dataset.audio.RiaaBiquad`	Apply RIAA vinyl playback equalization.
`mindspore.dataset.audio.SlidingWindowCmn`	Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.
`mindspore.dataset.audio.SpectralCentroid`	Create a spectral centroid from an audio signal.
`mindspore.dataset.audio.Spectrogram`	Create a spectrogram from an audio signal.
`mindspore.dataset.audio.TimeMasking`	给音频波形施加时域掩码。
`mindspore.dataset.audio.TimeStretch`	以给定的比例拉伸音频短时傅里叶（Short Time Fourier Transform, STFT）频谱的时域，但不改变音频的音高。
`mindspore.dataset.audio.TrebleBiquad`	Design a treble tone-control effect.
`mindspore.dataset.audio.Vad`	Attempt to trim silent background sounds from the end of the voice recording.
`mindspore.dataset.audio.Vol`	Apply amplification or attenuation to the whole waveform.

工具

`mindspore.dataset.audio.BorderType`	Padding Mode, BorderType Type.
`mindspore.dataset.audio.DensityFunction`	Density Functions.
`mindspore.dataset.audio.FadeShape`	Fade Shapes.
`mindspore.dataset.audio.GainType`	Gain Types.
`mindspore.dataset.audio.Interpolation`	Interpolation Type.
`mindspore.dataset.audio.MelType`	Mel Types.
`mindspore.dataset.audio.Modulation`	Modulation Type.
`mindspore.dataset.audio.NormMode`	Norm Types.
`mindspore.dataset.audio.NormType`	Norm Types.
`mindspore.dataset.audio.ResampleMethod`	Resample method
`mindspore.dataset.audio.ScaleType`	音频标度枚举类。
`mindspore.dataset.audio.WindowType`	Window Function types,
`mindspore.dataset.audio.create_dct`	Create a DCT transformation matrix with shape (n_mels, n_mfcc), normalized depending on norm.
`mindspore.dataset.audio.melscale_fbanks`	Create a frequency transformation matrix with shape (n_freqs, n_mels).