mindspore.DatasetHelper

View Source On Gitee
class mindspore.DatasetHelper(dataset, dataset_sink_mode=True, sink_size=- 1, epoch_num=1)[source]

DatasetHelper is a class to process the MindData dataset and provides the information of dataset.

According to different contexts, change the iterations of dataset and use the same iteration for loop in different contexts.

Note

The iteration of DatasetHelper will provide one epoch data.

Parameters
  • dataset (Dataset) – The dataset iterator. The dataset can be generated by dataset generator API in mindspore.dataset module, such as mindspore.dataset.ImageFolderDataset.

  • dataset_sink_mode (bool) – If the value is True, GetNext is employed to fetch the data at device through the dataset pipeline, otherwise fetch the data at host by iterating through the dataset. Default: True.

  • sink_size (int) – Control the amount of data in each sink. If sink_size=-1, sink the complete dataset for each epoch. If sink_size>0, sink sink_size data for each epoch. Default: -1.

  • epoch_num (int) – The number of passes of the entire dataset to be sent. Default: 1.

Examples

>>> import numpy as np
>>> import mindspore as ms
>>> from mindspore import nn
>>> from mindspore import dataset as ds
>>>
>>> data = {"x": np.float32(np.random.rand(64, 10)), "y": np.random.randint(0, 5, (64,))}
>>> train_dataset = ds.NumpySlicesDataset(data=data).batch(32)
>>> set_helper = ms.DatasetHelper(train_dataset, dataset_sink_mode=False)
>>>
>>> net = nn.Dense(10, 5)
>>> # Object of DatasetHelper is iterable
>>> for next_element in set_helper:
...     # `next_element` includes data and label, using data to run the net
...     data = next_element[0]
...     result = net(data)
continue_send()[source]

Continue to send data to device at the beginning of epoch.

Examples

>>> import numpy as np
>>> import mindspore as ms
>>> from mindspore import nn
>>> from mindspore import dataset as ds
>>>
>>> data = {"x": np.float32(np.random.rand(64, 10)), "y": np.random.randint(0, 5, (64,))}
>>> train_dataset = ds.NumpySlicesDataset(data=data).batch(32)
>>> dataset_helper = ms.DatasetHelper(train_dataset, dataset_sink_mode=True)
>>> dataset_helper.continue_send()
release()[source]

Free up resources about data sink.

Examples

>>> import numpy as np
>>> import mindspore as ms
>>> from mindspore import nn
>>> from mindspore import dataset as ds
>>>
>>> data = {"x": np.float32(np.random.rand(64, 10)), "y": np.random.randint(0, 5, (64,))}
>>> train_dataset = ds.NumpySlicesDataset(data=data).batch(32)
>>> dataset_helper = ms.DatasetHelper(train_dataset, dataset_sink_mode=True)
>>> dataset_helper.release()
sink_size()[source]

Get sink_size for each iteration.

Examples

>>> import mindspore as ms
>>> import numpy as np
>>>
>>> # Define a dataset pipeline
>>> def generator():
...    for i in range(5):
...        yield (np.ones((32, 10)),)
>>>
>>> train_dataset = ms.dataset.GeneratorDataset(generator, ["data"])
>>> dataset_helper = ms.DatasetHelper(train_dataset, dataset_sink_mode=True, sink_size=-1)
>>>
>>> # if sink_size==-1, then will return the full size of source dataset.
>>> sink_size = dataset_helper.sink_size()
stop_send()[source]

Stop send data about data sink.

Examples

>>> import mindspore as ms
>>> import numpy as np
>>> # Define a dataset pipeline
>>> def generator():
...    for i in range(5):
...        yield (np.ones((32, 10)),)
>>> train_dataset = ms.dataset.GeneratorDataset(generator, ["data"])
>>> dataset_helper = ms.DatasetHelper(train_dataset, dataset_sink_mode=True, sink_size=-1)
>>> dataset_helper.stop_send()
types_shapes()[source]

Get the types and shapes from dataset on the current configuration.

Examples

>>> import mindspore as ms
>>> import numpy as np
>>>
>>> # Define a dataset pipeline
>>> def generator():
...    for i in range(5):
...        yield (np.ones((32, 10)),)
>>>
>>> train_dataset = ms.dataset.GeneratorDataset(generator, ["data"])
>>> dataset_helper = ms.DatasetHelper(train_dataset, dataset_sink_mode=True)
>>>
>>> types, shapes = dataset_helper.types_shapes()