mindspore.dataset.audio.Vad
- class mindspore.dataset.audio.Vad(sample_rate, trigger_level=7.0, trigger_time=0.25, search_time=1.0, allowed_gap=0.25, pre_trigger_time=0.0, boot_time=0.35, noise_up_time=0.1, noise_down_time=0.01, noise_reduction_amount=1.35, measure_freq=20.0, measure_duration=None, measure_smooth_time=0.4, hp_filter_freq=50.0, lp_filter_freq=6000.0, hp_lifter_freq=150.0, lp_lifter_freq=2000.0)[source]
Voice activity detector.
Attempt to trim silence and quiet background sounds from the ends of recordings of speech.
Similar to SoX implementation.
- Parameters
sample_rate (int) – Sampling rate of audio signal.
trigger_level (float, optional) – The measurement level used to trigger activity detection. Default:
7.0
.trigger_time (float, optional) – The time constant (in seconds) used to help ignore short bursts of sounds. Default:
0.25
.search_time (float, optional) – The amount of audio (in seconds) to search for quieter/shorter bursts of audio to include prior to the detected trigger point. Default:
1.0
.allowed_gap (float, optional) – The allowed gap (in seconds) between quieter/shorter bursts of audio to include prior to the detected trigger point. Default:
0.25
.pre_trigger_time (float, optional) – The amount of audio (in seconds) to preserve before the trigger point and any found quieter/shorter bursts. Default:
0.0
.boot_time (float, optional) – The time for the initial noise estimate. Default:
0.35
.noise_up_time (float, optional) – Time constant used by the adaptive noise estimator for when the noise level is increasing. Default:
0.1
.noise_down_time (float, optional) – Time constant used by the adaptive noise estimator for when the noise level is decreasing. Default:
0.01
.noise_reduction_amount (float, optional) – Amount of noise reduction to use in the detection algorithm. Default: 1.35.
measure_freq (float, optional) – Frequency of the algorithm's processing/measurements. Default:
20.0
.measure_duration (float, optional) – The duration of measurement. Default:
None
, will use twice the measurement period.measure_smooth_time (float, optional) – Time constant used to smooth spectral measurements. Default:
0.4
.hp_filter_freq (float, optional) – The 'Brick-wall' frequency of high-pass filter applied at the input to the detector algorithm. Default:
50.0
.lp_filter_freq (float, optional) – The 'Brick-wall' frequency of low-pass filter applied at the input to the detector algorithm. Default:
6000.0
.hp_lifter_freq (float, optional) – The 'Brick-wall' frequency of high-pass lifter used in the detector algorithm. Default:
150.0
.lp_lifter_freq (float, optional) – The 'Brick-wall' frequency of low-pass lifter used in the detector algorithm. Default:
2000.0
.
- Raises
TypeError – If sample_rate is not of type int.
ValueError – If sample_rate is not a positive number.
TypeError – If trigger_level is not of type float.
TypeError – If trigger_time is not of type float.
ValueError – If trigger_time is a negative number.
TypeError – If search_time is not of type float.
ValueError – If search_time is a negative number.
TypeError – If allowed_gap is not of type float.
ValueError – If allowed_gap is a negative number.
TypeError – If pre_trigger_time is not of type float.
ValueError – If pre_trigger_time is a negative number.
TypeError – If boot_time is not of type float.
ValueError – If boot_time is a negative number.
TypeError – If noise_up_time is not of type float.
ValueError – If noise_up_time is a negative number.
TypeError – If noise_down_time is not of type float.
ValueError – If noise_down_time is a negative number.
ValueError – If noise_up_time is less than noise_down_time .
TypeError – If noise_reduction_amount is not of type float.
ValueError – If noise_reduction_amount is a negative number.
TypeError – If measure_freq is not of type float.
ValueError – If measure_freq is not a positive number.
TypeError – If measure_duration is not of type float.
ValueError – If measure_duration is a negative number.
TypeError – If measure_smooth_time is not of type float.
ValueError – If measure_smooth_time is a negative number.
TypeError – If hp_filter_freq is not of type float.
ValueError – If hp_filter_freq is not a positive number.
TypeError – If lp_filter_freq is not of type float.
ValueError – If lp_filter_freq is not a positive number.
TypeError – If hp_lifter_freq is not of type float.
ValueError – If hp_lifter_freq is not a positive number.
TypeError – If lp_lifter_freq is not of type float.
ValueError – If lp_lifter_freq is not a positive number.
RuntimeError – If input tensor is not in shape of <…, time>.
- Supported Platforms:
CPU
Examples
>>> import numpy as np >>> import mindspore.dataset as ds >>> import mindspore.dataset.audio as audio >>> >>> # Use the transform in dataset pipeline mode >>> waveform = np.random.random([5, 1000]) # 5 samples >>> numpy_slices_dataset = ds.NumpySlicesDataset(data=waveform, column_names=["audio"]) >>> transforms = [audio.Vad(sample_rate=600)] >>> numpy_slices_dataset = numpy_slices_dataset.map(operations=transforms, input_columns=["audio"]) >>> for item in numpy_slices_dataset.create_dict_iterator(num_epochs=1, output_numpy=True): ... print(item["audio"].shape, item["audio"].dtype) ... break (660,) float64 >>> >>> # Use the transform in eager mode >>> waveform = np.random.random([1000]) # 1 sample >>> output = audio.Vad(sample_rate=600)(waveform) >>> print(output.shape, output.dtype) (660,) float64
- Tutorial Examples: