mindspore.dataset.DistributedSampler
- class mindspore.dataset.DistributedSampler(num_shards, shard_id, shuffle=True, num_samples=None, offset=- 1)[source]
A sampler that accesses a shard of the dataset.
- Parameters
num_shards (int) – Number of shards to divide the dataset into.
shard_id (int) – Shard ID of the current shard within num_shards.
shuffle (bool, optional) – If True, the indices are shuffled (default=True).
num_samples (int, optional) – The number of samples to draw (default=None, all elements).
offset (int, optional) – The starting shard ID where the elements in the dataset are sent to (default=-1), which should be no more than num_shards.
Examples
>>> import mindspore.dataset as ds >>> >>> dataset_dir = "path/to/imagefolder_directory" >>> >>> # creates a distributed sampler with 10 shards in total. This shard is shard 5. >>> sampler = ds.DistributedSampler(10, 5) >>> data = ds.ImageFolderDataset(dataset_dir, num_parallel_workers=8, sampler=sampler)
- Raises
ValueError – If num_shards is not positive.
ValueError – If shard_id is smaller than 0 or equal to num_shards or larger than num_shards.
ValueError – If shuffle is not a boolean value.
ValueError – If offset is greater than num_shards.
- get_num_samples()
All samplers can contain a numeric num_samples value (or it can be set to None). A child sampler can exist or be None. If a child sampler exists, then the child sampler count can be a numeric value or None. These conditions impact the resultant sampler count that is used. The following table shows the possible results from calling this function.
child sampler
num_samples
child_samples
result
T
x
y
min(x, y)
T
x
None
x
T
None
y
y
T
None
None
None
None
x
n/a
x
None
None
n/a
None
- Returns
int, the number of samples, or None