Function mindspore::dataset::CMUArctic
Defined in File datasets.h
Function Documentation
Function to create a CMUArcticDataset.
Note
The generated dataset has four columns [“waveform”, “sample_rate”, “transcript”, “utterance_id”].
- Parameters
dataset_dir – [in] Path to the root directory that contains the dataset.
name – [in] Part of dataset of CMUArctic, can be “aew”, “ahw”, “aup”, “awb”, “axb”, “bdl”, “clb”, “eey”, “fem”, “gka”, “jmk”, “ksp”, “ljm”, “lnh”, “rms”, “rxr”, “slp” or “slt” (default = “aew”).
sampler – [in] Shared pointer to a sampler object used to choose samples from the dataset. If sampler is not given, a
RandomSampler
will be used to randomly iterate the entire dataset (default = RandomSampler()).cache – [in] Tensor cache to use (default=nullptr, which means no cache is used).
- Returns
Shared pointer to the CMUArcticDataset.
Example/* Define dataset path and MindData object */ std::string folder_path = "/path/to/cmu_arctic_dataset_directory"; std::shared_ptr<Dataset> ds = CMUArcticDataset(folder_path, name = "aew", std::make_shared<RandomSampler>(false, 10)); /* Create iterator to read dataset */ std::shared_ptr<Iterator> iter = ds->CreateIterator(); std::unordered_map<std::string, mindspore::MSTensor> row; iter->GetNextRow(&row); /* Note: In CMUArctic dataset, each data dictionary has keys "waveform", "sample_rate", "transcript" * and "utterance_id" */ auto waveform = row["waveform"];