mindspore.dataset.RandomSampler

class mindspore.dataset.RandomSampler(replacement=False, num_samples=None, shuffle=Shuffle.GLOBAL)[source]

Samples the elements randomly.

Note

The shuffling modes supported for different datasets are as follows:

List of support for shuffling mode
Shuffling Mode	MindDataset	TFRecordDataset	Others
`Shuffle.ADAPTIVE`	Supported	Not Supported	Not Supported
`Shuffle.GLOBAL`	Supported	Supported	Supported
`Shuffle.PARTIAL`	Supported	Not Supported	Not Supported
`Shuffle.FILES`	Supported	Supported	Not Supported
`Shuffle.INFILE`	Supported	Not Supported	Not Supported

Parameters

replacement (bool, optional) – If True, put the sample ID back for the next draw. Default: False.
num_samples (int, optional) – Number of elements to sample. Default: None , which means sample all elements.
shuffle (Shuffle, optional) –
Specify the shuffle mode. Default: Shuffle.GLOBAL, Global shuffle of all rows of data in dataset. There are several levels of shuffling, desired shuffle enum defined by mindspore.dataset.Shuffle .
- Shuffle.ADAPTIVE : When the number of dataset samples is less than or equal to 100 million, Shuffle.GLOBAL is used. When the number of dataset samples is greater than 100 million, Shuffle.PARTIAL is used. The shuffle is performed once every 1 million samples.
- Shuffle.GLOBAL : Global shuffle of all rows of data in dataset. The memory usage is large.
- Shuffle.PARTIAL : Partial shuffle of data in dataset for every 1 million samples. The memory usage is less than Shuffle.GLOBAL .
- Shuffle.FILES : Shuffle the file sequence but keep the order of data within each file.
- Shuffle.INFILE : Keep the file sequence the same but shuffle the data within each file.

Raises

TypeError – If replacement is not of type bool.
TypeError – If num_samples is not of type int.
ValueError – If num_samples is a negative value.
TypeError – If shuffle is not of type Shuffle.

Examples

>>> import mindspore.dataset as ds
>>> # creates a RandomSampler
>>> sampler = ds.RandomSampler()
>>> dataset = ds.ImageFolderDataset(image_folder_dataset_dir,
...                                 num_parallel_workers=8,
...                                 sampler=sampler)

add_child(sampler)[source]

Add a sub-sampler for given sampler. The parent will receive all data from the output of sub-sampler sampler and apply its sample logic to return new samples.

Note

If a child sampler is added and it has a shuffle option, its value cannot be Shuffle.PARTIAL . Additionally, the parent sampler's shuffle value must be Shuffle.GLOBAL .

Parameters: sampler (Sampler) – Object used to choose samples from the dataset. Only builtin samplers(mindspore.dataset.DistributedSampler , mindspore.dataset.PKSampler, mindspore.dataset.RandomSampler, mindspore.dataset.SequentialSampler, mindspore.dataset.SubsetRandomSampler, mindspore.dataset.WeightedRandomSampler ) are supported.

Examples

>>> import mindspore.dataset as ds
>>> sampler = ds.SequentialSampler(start_index=0, num_samples=3)
>>> sampler.add_child(ds.RandomSampler(num_samples=4))
>>> dataset = ds.Cifar10Dataset(cifar10_dataset_dir, sampler=sampler)

get_child()[source]

Get the child sampler of given sampler.

Returns: Sampler, The child sampler of given sampler.

Examples

>>> import mindspore.dataset as ds
>>> sampler = ds.SequentialSampler(start_index=0, num_samples=3)
>>> sampler.add_child(ds.RandomSampler(num_samples=2))
>>> child_sampler = sampler.get_child()

get_num_samples()[source]

Get num_samples value of the current sampler instance. This parameter can be optionally passed in when defining the Sampler. Default: None. This method will return the num_samples value. If the current sampler has child samplers, it will continue to access the child samplers and process the obtained value according to certain rules.

The following table shows the various possible combinations, and the final results returned.

child sampler	num_samples	child_samples	result
T	x	y	min(x, y)
T	x	None	x
T	None	y	y
T	None	None	None
None	x	n/a	x
None	None	n/a	None

Returns: int, the number of samples, or None.

Examples

>>> import mindspore.dataset as ds
>>> sampler = ds.SequentialSampler(start_index=0, num_samples=3)
>>> num_samplers = sampler.get_num_samples()