Differences with torchaudio.transforms.MelSpectrogram
torchaudio.transforms.MelSpectrogram
class torchaudio.transforms.MelSpectrogram(sample_rate: int = 16000, n_fft: int = 400, win_length: Optional[int] = None,
hop_length: Optional[int] = None, f_min: float = 0.0, f_max: Optional[float] = None,
pad: int = 0, n_mels: int = 128, window_fn: Callable[[...], torch.Tensor] = <built-in method hann_window of type object>,
power: Optional[float] = 2.0, normalized: bool = False, wkwargs: Optional[dict] = None,
center: bool = True, pad_mode: str = 'reflect', onesided: bool = True, norm: Optional[str] = None)
For more information, see torchaudio.transforms.MelSpectrogram.
mindspore.dataset.audio.MelSpectrogram
class mindspore.dataset.audio.MelSpectrogram(sample_rate=16000, n_fft=400, win_length=None,
hop_length=None, f_min=0.0, f_max=None,
pad=0, n_mels=128, window=WindowType.HANN, power=2.0, normalized=False,
center=True, pad_mode=BorderType.REFLECT, onesided=True, norm=NormType.NONE, mel_scale=MelType.HTK)
For more information, see mindspore.dataset.audio.MelSpectrogram.
Differences
PyTorch:Create MelSpectrogram for a raw audio signal. Customized window function and different parameter configs for window function are both supported.
MindSpore:Create MelSpectrogram for a raw audio signal.
Categories |
Subcategories |
PyTorch |
MindSpore |
Difference |
---|---|---|---|---|
Parameter |
Parameter1 |
sample_rate |
sample_rate |
- |
Parameter2 |
win_length |
win_length |
- |
|
Parameter3 |
hop_length |
hop_length |
- |
|
Parameter4 |
n_fft |
n_fft |
- |
|
Parameter5 |
f_min |
f_min |
- |
|
Parameter6 |
f_max |
f_max |
- |
|
Parameter7 |
pad |
pad |
- |
|
Parameter8 |
n_mels |
n_mels |
- |
|
Parameter9 |
window_fn |
window |
MindSpore only support 5 window functions |
|
Parameter10 |
power |
power |
- |
|
Parameter11 |
normalized |
normalized |
- |
|
Parameter12 |
wkwargs |
- |
Arguments for window function, not supported by MindSpore |
|
Parameter13 |
center |
center |
- |
|
Parameter14 |
pad_mode |
pad_mode |
- |
|
Parameter15 |
onesided |
onesided |
- |
|
Parameter16 |
norm |
norm |
- |
|
Parameter17 |
- |
mel_scale |
Mel scale to use |
Code Example
import numpy as np
fake_input = np.array([[[1, 1, 2, 2, 3, 3, 4]]]).astype(np.float32)
# PyTorch
import torch
import torchaudio.transforms as T
transformer = T.MelSpectrogram(sample_rate=16000, n_fft=4, win_length=2, hop_length=4, window_fn=torch.hann_window)
torch_result = transformer(torch.from_numpy(fake_input))
print(torch_result)
# Out: tensor([[[[0.0000, 0.0000],
# ...
# [0.5235, 4.7117],
# [0.4765, 4.2883],
# ...
# [0.0000, 0.0000]]]])
# MindSpore
import mindspore.dataset.audio as audio
transformer = audio.MelSpectrogram(sample_rate=16000, n_fft=4, win_length=2, hop_length=4, window=audio.WindowType.HANN)
ms_result = transformer(fake_input)
print(ms_result)
# Out: [[[[0. 0. ]
# ...
# [0.52353615 4.7118254 ]
# [0.47646385 4.2881746 ]
# ...
# [0. 0. ]]]]