mindspore.dataset.DistributedSampler

class mindspore.dataset.DistributedSampler(num_shards, shard_id, shuffle=True, num_samples=None, offset=- 1)[源代码]

分布式采样器，将数据集进行分片用于分布式训练。

参数：

num_shards (int) - 数据集分片数量。
shard_id (int) - 当前分片的分片ID，应在[0, num_shards-1]范围内。
shuffle (bool, 可选) - 是否混洗采样得到的样本。默认值： True ，混洗样本。
num_samples (int, 可选) - 获取的样本数，可用于部分获取采样得到的样本。默认值： None ，获取采样到的所有样本。
offset (int, 可选) - 分布式采样结果进行分配时的起始分片ID号，值不能大于参数 num_shards 。从不同的分片ID开始分配数据可能会影响每个分片的最终样本数。仅当ConcatDataset以 mindspore.dataset.DistributedSampler 为采样器时，此参数才有效。默认值： -1 ，每个分片具有相同的样本数。

异常：

TypeError - num_shards 的类型不是int。
TypeError - shard_id 的类型不是int。
TypeError - shuffle 的类型不是bool。
TypeError - num_samples 的类型不是int。
TypeError - offset 的类型不是int。
ValueError - num_samples 为负值。
RuntimeError - num_shards 不是正值。
RuntimeError - shard_id 小于0或大于等于 num_shards 。
RuntimeError - offset 大于 num_shards 。

样例：

>>> import mindspore.dataset as ds
>>> # creates a distributed sampler with 10 shards in total. This shard is shard 5.
>>> sampler = ds.DistributedSampler(10, 5)
>>> dataset = ds.ImageFolderDataset(image_folder_dataset_dir,
...                                 num_parallel_workers=8,
...                                 sampler=sampler)

add_child(sampler)

为给定采样器添加子采样器。子采样器接收父采样器输出数据作为输入，并应用其采样逻辑返回新的采样结果。

参数：

sampler (Sampler) - 用于从数据集中选择样本的对象。仅支持内置采样器（ mindspore.dataset.DistributedSampler 、 mindspore.dataset.PKSampler 、 mindspore.dataset.RandomSampler 、 mindspore.dataset.SequentialSampler 、 mindspore.dataset.SubsetRandomSampler 、 mindspore.dataset.WeightedRandomSampler ）。

样例：

>>> import mindspore.dataset as ds
>>> sampler = ds.SequentialSampler(start_index=0, num_samples=3)
>>> sampler.add_child(ds.RandomSampler(num_samples=4))
>>> dataset = ds.Cifar10Dataset(cifar10_dataset_dir, sampler=sampler)

get_child()

获取给定采样器的子采样器。

返回：: Sampler，给定采样器的子采样器。

样例：

>>> import mindspore.dataset as ds
>>> sampler = ds.SequentialSampler(start_index=0, num_samples=3)
>>> sampler.add_child(ds.RandomSampler(num_samples=2))
>>> child_sampler = sampler.get_child()

get_num_samples()

获取当前采样器实例的 num_samples 参数值。此参数在定义Sampler时，可以选择性传入（默认为 None ）。此方法将返回num_samples的值。如果当前采样器有子采样器，会继续访问子采样器，并根据一定的规则处理获取值。

下表显示了各种可能的组合，以及最终返回的结果。

子采样器	num_samples	child_samples	结果
T	x	y	min(x, y)
T	x	None	x
T	None	y	y
T	None	None	None
None	x	n/a	x
None	None	n/a	None

返回：: int，样本数，可为None。

样例：

>>> import mindspore.dataset as ds
>>> sampler = ds.SequentialSampler(start_index=0, num_samples=3)
>>> num_samplers = sampler.get_num_samples()