mindspore.dataset.WeightedRandomSampler

class mindspore.dataset.WeightedRandomSampler(weights, num_samples=None, replacement=True)[source]

Samples the elements from [0, len(weights) - 1] randomly with the given weights (probabilities).

Parameters
  • weights (list[float, int]) – A sequence of weights, not necessarily summing up to 1.

  • num_samples (int, optional) – Number of elements to sample (default=None, which means sample all elements).

  • replacement (bool) – If True, put the sample ID back for the next draw (default=True).

Examples

>>> weights = [0.9, 0.01, 0.4, 0.8, 0.1, 0.1, 0.3]
>>>
>>> # creates a WeightedRandomSampler that will sample 4 elements without replacement
>>> sampler = ds.WeightedRandomSampler(weights, 4)
>>> dataset = ds.ImageFolderDataset(image_folder_dataset_dir,
...                                 num_parallel_workers=8,
...                                 sampler=sampler)
Raises
  • TypeError – If type of weights element is not a number.

  • TypeError – If num_samples is not an integer value.

  • TypeError – If replacement is not a boolean value.

  • RuntimeError – If weights is empty or all zero.

  • ValueError – If num_samples is a negative value.

add_child(sampler)

Add a sub-sampler for given sampler. The sub-sampler will receive all data from the output of parent sampler and apply its sample logic to return new samples.

Parameters

sampler (Sampler) – Object used to choose samples from the dataset. Only builtin samplers(DistributedSampler, PKSampler, RandomSampler, SequentialSampler, SubsetRandomSampler, WeightedRandomSampler) are supported.

Examples

>>> sampler = ds.SequentialSampler(start_index=0, num_samples=3)
>>> sampler.add_child(ds.RandomSampler(num_samples=2))
>>> dataset = ds.Cifar10Dataset(cifar10_dataset_dir, sampler=sampler)
get_child()

Get the child sampler.

get_num_samples()

All samplers can contain a numeric num_samples value (or it can be set to None). A child sampler can exist or be None. If a child sampler exists, then the child sampler count can be a numeric value or None. These conditions impact the resultant sampler count that is used. The following table shows the possible results from calling this function.

child sampler

num_samples

child_samples

result

T

x

y

min(x, y)

T

x

None

x

T

None

y

y

T

None

None

None

None

x

n/a

x

None

None

n/a

None

Returns

int, the number of samples, or None.

parse()[source]

Parse the sampler.

parse_child()

Parse the child sampler.

parse_child_for_minddataset()

Parse the child sampler for MindRecord.