mindspore.dataset.audio.Vad

View Source On Gitee
class mindspore.dataset.audio.Vad(sample_rate, trigger_level=7.0, trigger_time=0.25, search_time=1.0, allowed_gap=0.25, pre_trigger_time=0.0, boot_time=0.35, noise_up_time=0.1, noise_down_time=0.01, noise_reduction_amount=1.35, measure_freq=20.0, measure_duration=None, measure_smooth_time=0.4, hp_filter_freq=50.0, lp_filter_freq=6000.0, hp_lifter_freq=150.0, lp_lifter_freq=2000.0)[source]

Voice activity detector.

Attempt to trim silence and quiet background sounds from the ends of recordings of speech.

Similar to SoX implementation.

Parameters
  • sample_rate (int) – Sampling rate of audio signal.

  • trigger_level (float, optional) – The measurement level used to trigger activity detection. Default: 7.0.

  • trigger_time (float, optional) – The time constant (in seconds) used to help ignore short bursts of sounds. Default: 0.25.

  • search_time (float, optional) – The amount of audio (in seconds) to search for quieter/shorter bursts of audio to include prior to the detected trigger point. Default: 1.0.

  • allowed_gap (float, optional) – The allowed gap (in seconds) between quieter/shorter bursts of audio to include prior to the detected trigger point. Default: 0.25.

  • pre_trigger_time (float, optional) – The amount of audio (in seconds) to preserve before the trigger point and any found quieter/shorter bursts. Default: 0.0.

  • boot_time (float, optional) – The time for the initial noise estimate. Default: 0.35.

  • noise_up_time (float, optional) – Time constant used by the adaptive noise estimator for when the noise level is increasing. Default: 0.1.

  • noise_down_time (float, optional) – Time constant used by the adaptive noise estimator for when the noise level is decreasing. Default: 0.01.

  • noise_reduction_amount (float, optional) – Amount of noise reduction to use in the detection algorithm. Default: 1.35.

  • measure_freq (float, optional) – Frequency of the algorithm’s processing/measurements. Default: 20.0.

  • measure_duration (float, optional) – The duration of measurement. Default: None, will use twice the measurement period.

  • measure_smooth_time (float, optional) – Time constant used to smooth spectral measurements. Default: 0.4.

  • hp_filter_freq (float, optional) – The ‘Brick-wall’ frequency of high-pass filter applied at the input to the detector algorithm. Default: 50.0.

  • lp_filter_freq (float, optional) – The ‘Brick-wall’ frequency of low-pass filter applied at the input to the detector algorithm. Default: 6000.0.

  • hp_lifter_freq (float, optional) – The ‘Brick-wall’ frequency of high-pass lifter used in the detector algorithm. Default: 150.0.

  • lp_lifter_freq (float, optional) – The ‘Brick-wall’ frequency of low-pass lifter used in the detector algorithm. Default: 2000.0.

Raises
  • TypeError – If sample_rate is not of type int.

  • ValueError – If sample_rate is not a positive number.

  • TypeError – If trigger_level is not of type float.

  • TypeError – If trigger_time is not of type float.

  • ValueError – If trigger_time is a negative number.

  • TypeError – If search_time is not of type float.

  • ValueError – If search_time is a negative number.

  • TypeError – If allowed_gap is not of type float.

  • ValueError – If allowed_gap is a negative number.

  • TypeError – If pre_trigger_time is not of type float.

  • ValueError – If pre_trigger_time is a negative number.

  • TypeError – If boot_time is not of type float.

  • ValueError – If boot_time is a negative number.

  • TypeError – If noise_up_time is not of type float.

  • ValueError – If noise_up_time is a negative number.

  • TypeError – If noise_down_time is not of type float.

  • ValueError – If noise_down_time is a negative number.

  • ValueError – If noise_up_time is less than noise_down_time .

  • TypeError – If noise_reduction_amount is not of type float.

  • ValueError – If noise_reduction_amount is a negative number.

  • TypeError – If measure_freq is not of type float.

  • ValueError – If measure_freq is not a positive number.

  • TypeError – If measure_duration is not of type float.

  • ValueError – If measure_duration is a negative number.

  • TypeError – If measure_smooth_time is not of type float.

  • ValueError – If measure_smooth_time is a negative number.

  • TypeError – If hp_filter_freq is not of type float.

  • ValueError – If hp_filter_freq is not a positive number.

  • TypeError – If lp_filter_freq is not of type float.

  • ValueError – If lp_filter_freq is not a positive number.

  • TypeError – If hp_lifter_freq is not of type float.

  • ValueError – If hp_lifter_freq is not a positive number.

  • TypeError – If lp_lifter_freq is not of type float.

  • ValueError – If lp_lifter_freq is not a positive number.

  • RuntimeError – If input tensor is not in shape of <…, time>.

Supported Platforms:

CPU

Examples

>>> import numpy as np
>>> import mindspore.dataset as ds
>>> import mindspore.dataset.audio as audio
>>>
>>> # Use the transform in dataset pipeline mode
>>> waveform = np.random.random([5, 1000])  # 5 samples
>>> numpy_slices_dataset = ds.NumpySlicesDataset(data=waveform, column_names=["audio"])
>>> transforms = [audio.Vad(sample_rate=600)]
>>> numpy_slices_dataset = numpy_slices_dataset.map(operations=transforms, input_columns=["audio"])
>>> for item in numpy_slices_dataset.create_dict_iterator(num_epochs=1, output_numpy=True):
...     print(item["audio"].shape, item["audio"].dtype)
...     break
(660,) float64
>>>
>>> # Use the transform in eager mode
>>> waveform = np.random.random([1000])  # 1 sample
>>> output = audio.Vad(sample_rate=600)(waveform)
>>> print(output.shape, output.dtype)
(660,) float64
Tutorial Examples: