mindspore.dataset.audio
此模块用于音频数据增强,包括 transforms 和 utils 两个子模块。 transforms 是一个高性能音频数据增强模块,支持常见的音频数据增强操作。 utils 提供了一些音频处理的工具方法。
API样例中常用的导入模块如下:
import mindspore.dataset as ds
import mindspore.dataset.audio as audio
常用数据处理术语说明如下:
TensorOperation,所有C++实现的数据处理操作的基类。
AudioTensorOperation,所有音频数据处理操作的基类,派生自TensorOperation。
数据增强算子可以放入数据处理Pipeline中执行,也可以Eager模式执行:
Pipeline模式一般用于处理数据集,示例可参考 数据处理Pipeline介绍。
Eager模式一般用于零散样本,音频预处理举例如下:
import numpy as np import mindspore.dataset.audio as audio from mindspore.dataset.audio import ResampleMethod # 音频输入 waveform = np.random.random([1, 30]) # 增强操作 resample_op = audio.Resample(orig_freq=48000, new_freq=16000, resample_method=ResampleMethod.SINC_INTERPOLATION, lowpass_filter_width=6, rolloff=0.99, beta=None) waveform_resampled = resample_op(waveform) print("waveform reampled: {}".format(waveform_resampled), flush=True)
变换
给音频波形施加双极点全通滤波器,其中心频率和带宽由入参指定。 |
|
将输入音频从振幅/功率标度转换为分贝标度。 |
|
计算复数序列的角度。 |
|
给音频波形施加双极点带通滤波器。 |
|
给音频波形施加双极点巴特沃斯(Butterworth)带通滤波器。 |
|
给音频波形施加双极点巴特沃斯(Butterworth)带阻滤波器。 |
|
给音频波形施加低音控制效果,即双极点低频搁架滤波器。 |
|
Perform a biquad filter of input audio. |
|
计算复数序列的范数。 |
|
Compute delta coefficients of a spectrogram. |
|
给音频波形施加对比度增强效果。 |
|
Turn a waveform from the decibel scale to the power/amplitude scale. |
|
Apply a DC shift to the audio. |
|
Design two-pole deemph filter for audio waveform of dimension of (..., time). |
|
Detect pitch frequency. |
|
Dither increases the perceived dynamic range of audio stored at a particular bit-depth by eliminating nonlinear truncation distortion. |
|
Design biquad equalizer filter and perform filtering. |
|
Add a fade in and/or fade out to an waveform. |
|
Apply a flanger effect to the audio. |
|
给音频波形施加频域掩码。 |
|
Apply amplification or attenuation to the whole waveform. |
|
Approximate magnitude spectrogram inversion using the GriffinLim algorithm. |
|
Design biquad highpass filter and perform filtering. |
|
Solve for a normal STFT form a mel frequency STFT, using a conversion matrix. |
|
Design two-pole filter for audio waveform of dimension of (..., time). |
|
给音频波形施加双极点低通滤波器。 |
|
Separate a complex-valued spectrogram with shape (..., 2) into its magnitude and phase. |
|
Apply a mask along axis. |
|
Apply a mask along axis. |
|
Convert normal STFT to STFT at the Mel scale. |
|
Decode mu-law encoded signal. |
|
Encode signal based on mu-law companding. |
|
Apply overdrive on input audio. |
|
Apply a phasing effect to the audio. |
|
Given a STFT tensor, speed up in time without modifying pitch by a factor of rate. |
|
Resample a signal from one frequency to another. |
|
Apply RIAA vinyl playback equalization. |
|
Apply sliding-window cepstral mean (and optionally variance) normalization per utterance. |
|
Create a spectral centroid from an audio signal. |
|
Create a spectrogram from an audio signal. |
|
给音频波形施加时域掩码。 |
|
以给定的比例拉伸音频短时傅里叶(Short Time Fourier Transform, STFT)频谱的时域,但不改变音频的音高。 |
|
Design a treble tone-control effect. |
|
Attempt to trim silent background sounds from the end of the voice recording. |
|
Apply amplification or attenuation to the whole waveform. |
工具
Padding Mode, BorderType Type. |
|
Density Functions. |
|
Fade Shapes. |
|
Gain Types. |
|
Interpolation Type. |
|
Mel Types. |
|
Modulation Type. |
|
Norm Types. |
|
Norm Types. |
|
Resample method |
|
音频标度枚举类。 |
|
Window Function types, |
|
Create a DCT transformation matrix with shape (n_mels, n_mfcc), normalized depending on norm. |
|
Create a frequency transformation matrix with shape (n_freqs, n_mels). |