mindspore.dataset

This module provides APIs to load and process various common datasets such as MNIST, CIFAR-10, CIFAR-100, VOC, COCO, ImageNet, CelebA, CLUE, etc. It also supports datasets in standard format, including MindRecord, TFRecord, Manifest, etc. Users can also define their own datasets with this module.

Besides, this module provides APIs to sample data while loading.

We can enable cache in most of the dataset with its key arguments ‘cache’. Please notice that cache is not supported on Windows platform yet. Do not use it while loading and processing data on Windows. More introductions and limitations can refer Single-Node Tensor Cache.

Common imported modules in corresponding API examples are as follows:

import mindspore.dataset as ds
from mindspore.dataset.transforms import c_transforms

Vision

mindspore.dataset.CelebADataset

A source dataset for reading and parsing CelebA dataset.

mindspore.dataset.Cifar100Dataset

A source dataset for reading and parsing Cifar100 dataset.

mindspore.dataset.Cifar10Dataset

A source dataset for reading and parsing Cifar10 dataset.

mindspore.dataset.CocoDataset

A source dataset for reading and parsing COCO dataset.

mindspore.dataset.ImageFolderDataset

A source dataset that reads images from a tree of directories.

mindspore.dataset.MnistDataset

A source dataset for reading and parsing the MNIST dataset.

mindspore.dataset.VOCDataset

A source dataset for reading and parsing VOC dataset.

Text

mindspore.dataset.CLUEDataset

A source dataset that reads and parses CLUE datasets.

Graph

mindspore.dataset.GraphData

Reads the graph dataset used for GNN training from the shared file and database.

Standard Format

mindspore.dataset.CSVDataset

A source dataset that reads and parses comma-separated values (CSV) datasets.

mindspore.dataset.ManifestDataset

A source dataset for reading images from a Manifest file.

mindspore.dataset.MindDataset

A source dataset for reading and parsing MindRecord dataset.

mindspore.dataset.TextFileDataset

A source dataset that reads and parses datasets stored on disk in text format.

mindspore.dataset.TFRecordDataset

A source dataset for reading and parsing datasets stored on disk in TFData format.

User Defined

mindspore.dataset.GeneratorDataset

A source dataset that generates data from Python by invoking Python data source each epoch.

mindspore.dataset.NumpySlicesDataset

Creates a dataset with given data slices, mainly for loading Python data into dataset.

mindspore.dataset.PaddedDataset

Creates a dataset with filler data provided by user.

Sampler

mindspore.dataset.DistributedSampler

A sampler that accesses a shard of the dataset, it helps divide dataset into multi-subset for distributed training.

mindspore.dataset.PKSampler

Samples K elements for each P class in the dataset.

mindspore.dataset.RandomSampler

Samples the elements randomly.

mindspore.dataset.SequentialSampler

Samples the dataset elements sequentially that is equivalent to not using a sampler.

mindspore.dataset.SubsetRandomSampler

Samples the elements randomly from a sequence of indices.

mindspore.dataset.SubsetSampler

Samples the elements from a sequence of indices.

mindspore.dataset.WeightedRandomSampler

Samples the elements from [0, len(weights) - 1] randomly with the given weights (probabilities).

Others

mindspore.dataset.DatasetCache

A client to interface with tensor caching service.

mindspore.dataset.DSCallback

Abstract base class used to build a dataset callback class.

mindspore.dataset.Schema

Class to represent a schema of a dataset.

mindspore.dataset.WaitedDSCallback

Abstract base class used to build a dataset callback class that is synchronized with the training callback.

mindspore.dataset.compare

Compare if two dataset pipelines are the same.

mindspore.dataset.deserialize

Construct dataset pipeline from a JSON file produced by de.serialize().

mindspore.dataset.serialize

Serialize dataset pipeline into a JSON file.

mindspore.dataset.show

Write the dataset pipeline graph to logger.info file.

mindspore.dataset.zip

Zip the datasets in the input tuple of datasets.