Class DistributedSampler

Inheritance Relationships

Base Type

Class Documentation

class DistributedSampler : public mindspore::dataset::Sampler

A class to represent a Distributed Sampler in the data pipeline.

Note

A Sampler that accesses a shard of the dataset.

Public Functions

DistributedSampler(int64_t num_shards, int64_t shard_id, bool shuffle = true, int64_t num_samples = 0, uint32_t seed = 1, int64_t offset = -1, bool even_dist = true)

Constructor.

Parameters
  • num_shards[in] Number of shards to divide the dataset into.

  • shard_id[in] Shard ID of the current shard within num_shards.

  • shuffle[in] If true, the indices are shuffled (default=true).

  • num_samples[in] The number of samples to draw (default=0, return all samples).

  • seed[in] The seed in use when shuffle is true (default=1).

  • offset[in] The starting position where access to elements in the dataset begins (default=-1).

  • even_dist[in] If true, each shard would return the same number of rows (default=true). If false the total rows returned by all the shards would not have overlap.

Example
/* creates a distributed sampler with 2 shards in total. This shard is shard 0 */
std::string file_path = "/path/to/test.mindrecord";
std::shared_ptr<Dataset> ds = MindData(file_path, {}, std::make_shared<DistributedSampler>(2, 0, false));
~DistributedSampler() = default

Destructor.