Function mindspore::dataset::CMUArctic

Function Documentation

inline std::shared_ptr<CMUArcticDataset> mindspore::dataset::CMUArctic(const std::string &dataset_dir, const std::string &name = "aew", const std::shared_ptr<Sampler> &sampler = std::make_shared<RandomSampler>(), const std::shared_ptr<DatasetCache> &cache = nullptr)

Function to create a CMUArcticDataset.

Note

The generated dataset has four columns [“waveform”, “sample_rate”, “transcript”, “utterance_id”].

Parameters
  • dataset_dir[in] Path to the root directory that contains the dataset.

  • name[in] Part of dataset of CMUArctic, can be “aew”, “ahw”, “aup”, “awb”, “axb”, “bdl”, “clb”, “eey”, “fem”, “gka”, “jmk”, “ksp”, “ljm”, “lnh”, “rms”, “rxr”, “slp” or “slt” (default = “aew”).

  • sampler[in] Shared pointer to a sampler object used to choose samples from the dataset. If sampler is not given, a RandomSampler will be used to randomly iterate the entire dataset (default = RandomSampler()).

  • cache[in] Tensor cache to use (default=nullptr, which means no cache is used).

Returns

Shared pointer to the CMUArcticDataset.

Example
        /* Define dataset path and MindData object */
        std::string folder_path = "/path/to/cmu_arctic_dataset_directory";
        std::shared_ptr<Dataset> ds =
            CMUArcticDataset(folder_path, name = "aew", std::make_shared<RandomSampler>(false, 10));
  
        /* Create iterator to read dataset */
        std::shared_ptr<Iterator> iter = ds->CreateIterator();
        std::unordered_map<std::string, mindspore::MSTensor> row;
        iter->GetNextRow(&row);
  
        /* Note: In CMUArctic dataset, each data dictionary has keys "waveform", "sample_rate", "transcript"
*          and "utterance_id" */
        auto waveform = row["waveform"];