mindspore.dataset

This module provides APIs to load and process various common datasets such as MNIST, CIFAR-10, CIFAR-100, VOC, COCO, ImageNet, CelebA, CLUE, etc. It also supports datasets in standard format, including MindRecord, TFRecord, Manifest, etc. Users can also define their owndatasets with this module.

Besides, this module provides APIs to sample data while loading.

Please notice that cache is not supported on Windows platform yet. Please do not use it while loading and processing data on Windows.

Vision

`mindspore.dataset.CelebADataset`	A source dataset for reading and parsing CelebA dataset.
`mindspore.dataset.Cifar100Dataset`	A source dataset for reading and parsing Cifar100 dataset.
`mindspore.dataset.Cifar10Dataset`	A source dataset for reading and parsing Cifar10 dataset.
`mindspore.dataset.CocoDataset`	A source dataset for reading and parsing COCO dataset.
`mindspore.dataset.ImageFolderDataset`	A source dataset that reads images from a tree of directories.
`mindspore.dataset.MnistDataset`	A source dataset for reading and parsing the MNIST dataset.
`mindspore.dataset.VOCDataset`	A source dataset for reading and parsing VOC dataset.

Text

mindspore.dataset.CLUEDataset

A source dataset that reads and parses CLUE datasets.

Graph

mindspore.dataset.GraphData

Reads the graph dataset used for GNN training from the shared file and database.

Standard Format

`mindspore.dataset.CSVDataset`	A source dataset that reads and parses comma-separated values (CSV) datasets.
`mindspore.dataset.ManifestDataset`	A source dataset for reading images from a Manifest file.
`mindspore.dataset.MindDataset`	A source dataset for reading and parsing MindRecord dataset.
`mindspore.dataset.TextFileDataset`	A source dataset that reads and parses datasets stored on disk in text format.
`mindspore.dataset.TFRecordDataset`	A source dataset for reading and parsing datasets stored on disk in TFData format.

User Defined

`mindspore.dataset.GeneratorDataset`	A source dataset that generates data from Python by invoking Python data source each epoch.
`mindspore.dataset.NumpySlicesDataset`	Creates a dataset with given data slices, mainly for loading Python data into dataset.
`mindspore.dataset.PaddedDataset`	Creates a dataset with filler data provided by user.

Sampler

`mindspore.dataset.DistributedSampler`	A sampler that accesses a shard of the dataset.
`mindspore.dataset.PKSampler`	Samples K elements for each P class in the dataset.
`mindspore.dataset.RandomSampler`	Samples the elements randomly.
`mindspore.dataset.SequentialSampler`	Samples the dataset elements sequentially, same as not having a sampler.
`mindspore.dataset.SubsetRandomSampler`	Samples the elements randomly from a sequence of indices.
`mindspore.dataset.SubsetSampler`	Samples the elements from a sequence of indices.
`mindspore.dataset.WeightedRandomSampler`	Samples the elements from [0, len(weights) - 1] randomly with the given weights (probabilities).

Others

`mindspore.dataset.DatasetCache`	A client to interface with tensor caching service.
`mindspore.dataset.Schema`	Class to represent a schema of a dataset.
`mindspore.dataset.zip`	Zip the datasets in the input tuple of datasets.