mindspore.dataset.audio.Vad

class mindspore.dataset.audio.Vad(sample_rate, trigger_level=7.0, trigger_time=0.25, search_time=1.0, allowed_gap=0.25, pre_trigger_time=0.0, boot_time=0.35, noise_up_time=0.1, noise_down_time=0.01, noise_reduction_amount=1.35, measure_freq=20.0, measure_duration=None, measure_smooth_time=0.4, hp_filter_freq=50.0, lp_filter_freq=6000.0, hp_lifter_freq=150.0, lp_lifter_freq=2000.0)[source]

Voice activity detector.

Attempt to trim silence and quiet background sounds from the ends of recordings of speech.

Similar to SoX implementation.

Parameters

sample_rate (int) – Sampling rate of audio signal.
trigger_level (float, optional) – The measurement level used to trigger activity detection. Default: 7.0.
trigger_time (float, optional) – The time constant (in seconds) used to help ignore short bursts of sounds. Default: 0.25.
search_time (float, optional) – The amount of audio (in seconds) to search for quieter/shorter bursts of audio to include prior to the detected trigger point. Default: 1.0.
allowed_gap (float, optional) – The allowed gap (in seconds) between quieter/shorter bursts of audio to include prior to the detected trigger point. Default: 0.25.
pre_trigger_time (float, optional) – The amount of audio (in seconds) to preserve before the trigger point and any found quieter/shorter bursts. Default: 0.0.
boot_time (float, optional) – The time for the initial noise estimate. Default: 0.35.
noise_up_time (float, optional) – Time constant used by the adaptive noise estimator for when the noise level is increasing. Default: 0.1.
noise_down_time (float, optional) – Time constant used by the adaptive noise estimator for when the noise level is decreasing. Default: 0.01.
noise_reduction_amount (float, optional) – Amount of noise reduction to use in the detection algorithm. Default: 1.35.
measure_freq (float, optional) – Frequency of the algorithm's processing/measurements. Default: 20.0.
measure_duration (float, optional) – The duration of measurement. Default: None, will use twice the measurement period.
measure_smooth_time (float, optional) – Time constant used to smooth spectral measurements. Default: 0.4.
hp_filter_freq (float, optional) – The 'Brick-wall' frequency of high-pass filter applied at the input to the detector algorithm. Default: 50.0.
lp_filter_freq (float, optional) – The 'Brick-wall' frequency of low-pass filter applied at the input to the detector algorithm. Default: 6000.0.
hp_lifter_freq (float, optional) – The 'Brick-wall' frequency of high-pass lifter used in the detector algorithm. Default: 150.0.
lp_lifter_freq (float, optional) – The 'Brick-wall' frequency of low-pass lifter used in the detector algorithm. Default: 2000.0.

Raises

TypeError – If sample_rate is not of type int.
ValueError – If sample_rate is not a positive number.
TypeError – If trigger_level is not of type float.
TypeError – If trigger_time is not of type float.
ValueError – If trigger_time is a negative number.
TypeError – If search_time is not of type float.
ValueError – If search_time is a negative number.
TypeError – If allowed_gap is not of type float.
ValueError – If allowed_gap is a negative number.
TypeError – If pre_trigger_time is not of type float.
ValueError – If pre_trigger_time is a negative number.
TypeError – If boot_time is not of type float.
ValueError – If boot_time is a negative number.
TypeError – If noise_up_time is not of type float.
ValueError – If noise_up_time is a negative number.
TypeError – If noise_down_time is not of type float.
ValueError – If noise_down_time is a negative number.
ValueError – If noise_up_time is less than noise_down_time .
TypeError – If noise_reduction_amount is not of type float.
ValueError – If noise_reduction_amount is a negative number.
TypeError – If measure_freq is not of type float.
ValueError – If measure_freq is not a positive number.
TypeError – If measure_duration is not of type float.
ValueError – If measure_duration is a negative number.
TypeError – If measure_smooth_time is not of type float.
ValueError – If measure_smooth_time is a negative number.
TypeError – If hp_filter_freq is not of type float.
ValueError – If hp_filter_freq is not a positive number.
TypeError – If lp_filter_freq is not of type float.
ValueError – If lp_filter_freq is not a positive number.
TypeError – If hp_lifter_freq is not of type float.
ValueError – If hp_lifter_freq is not a positive number.
TypeError – If lp_lifter_freq is not of type float.
ValueError – If lp_lifter_freq is not a positive number.
RuntimeError – If input tensor is not in shape of <…, time>.

Supported Platforms:: CPU

Examples

>>> import numpy as np
>>> import mindspore.dataset as ds
>>> import mindspore.dataset.audio as audio
>>>
>>> # Use the transform in dataset pipeline mode
>>> waveform = np.random.random([5, 1000])  # 5 samples
>>> numpy_slices_dataset = ds.NumpySlicesDataset(data=waveform, column_names=["audio"])
>>> transforms = [audio.Vad(sample_rate=600)]
>>> numpy_slices_dataset = numpy_slices_dataset.map(operations=transforms, input_columns=["audio"])
>>> for item in numpy_slices_dataset.create_dict_iterator(num_epochs=1, output_numpy=True):
...     print(item["audio"].shape, item["audio"].dtype)
...     break
(660,) float64
>>>
>>> # Use the transform in eager mode
>>> waveform = np.random.random([1000])  # 1 sample
>>> output = audio.Vad(sample_rate=600)(waveform)
>>> print(output.shape, output.dtype)
(660,) float64

Tutorial Examples:

Illustration of audio transforms