Document feedback

Question document fragment

When a question document fragment contains a formula, it is displayed as a space.

Submission type
issue

It's a little complicated...

I'd like to ask someone.

Please select the submission type

Problem type
Specifications and Common Mistakes

- Specifications and Common Mistakes:

- Misspellings or punctuation mistakes,incorrect formulas, abnormal display.

- Incorrect links, empty cells, or wrong formats.

- Chinese characters in English context.

- Minor inconsistencies between the UI and descriptions.

- Low writing fluency that does not affect understanding.

- Incorrect version numbers, including software package names and version numbers on the UI.

Usability

- Usability:

- Incorrect or missing key steps.

- Missing main function descriptions, keyword explanation, necessary prerequisites, or precautions.

- Ambiguous descriptions, unclear reference, or contradictory context.

- Unclear logic, such as missing classifications, items, and steps.

Correctness

- Correctness:

- Technical principles, function descriptions, supported platforms, parameter types, or exceptions inconsistent with that of software implementation.

- Incorrect schematic or architecture diagrams.

- Incorrect commands or command parameters.

- Incorrect code.

- Commands inconsistent with the functions.

- Wrong screenshots.

- Sample code running error, or running results inconsistent with the expectation.

Risk Warnings

- Risk Warnings:

- Lack of risk warnings for operations that may damage the system or important data.

Content Compliance

- Content Compliance:

- Contents that may violate applicable laws and regulations or geo-cultural context-sensitive words and expressions.

- Copyright infringement.

Please select the type of question

Problem description

Describe the bug so that we can quickly locate the problem.

Data Iteration

Ascend GPU CPU Data Preparation

image0image1

Translator: Ming__blue

Overview

Original dataset is read into the memory through dataset loading interface, and then data is transformed through data enhancement operation. The obtained dataset object has two conventional data iteration methods:

  • Create an iterator for data iteration.

  • Pass in the model interface (such as model.train, model.eval, etc.) for iterative training or inference.

Create an iterator for data iteration

Dataset objects can usually create two different iterators to traverse the data, namely tuple iterator and dictionary iterator.

The interface for creating tuple iterator is create_tuple_iterator, and the interface for creating dictionary iterator is create_dict_iterator. The specific usage is as follows.

First, arbitrarily create a dataset object as a demonstration.

[1]:
import mindspore.dataset as ds

np_data = [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]
dataset = ds.NumpySlicesDataset(np_data, column_names=["data"], shuffle=False)

Following methods can be used to create a data iterator.

[2]:
# Create tuple iterator
print("\n create tuple iterator")
for item in dataset.create_tuple_iterator():
    print("item:\n", item[0])

# Create dictionary iterator
print("\n create dict iterator")
for item in dataset.create_dict_iterator():
    print("item:\n", item["data"])

# Traverse the dataset object directly (equivalent to creating tuple iterator)
print("\n iterate dataset object directly")
for item in dataset:
    print("item:\n", item[0])

# Traverse the dataset object using enumerate method(equivalent to creating tuple iterator)
print("\n iterate dataset using enumerate")
for index, item in enumerate(dataset):
    print("index: {}, item:\n {}".format(index, item[0]))
 create tuple iterator
item:
 [[1 2]
 [3 4]]
item:
 [[5 6]
 [7 8]]

 create dict iterator
item:
 [[1 2]
 [3 4]]
item:
 [[5 6]
 [7 8]]

 iterate dataset object directly
item:
 [[1 2]
 [3 4]]
item:
 [[5 6]
 [7 8]]

 iterate dataset using enumerate
index: 0, item:
 [[1 2]
 [3 4]]
index: 1, item:
 [[5 6]
 [7 8]]

In addition, to generate data in multiple Epochs, adjust the value of the input parameter num_epochs accordingly. Compared with calling the iterator interface multiple times, directly setting the Epoch number can improve the performance of data iteration.

[3]:
# Create tuple iterator to generate data in two Epochs
epoch = 2
iterator = dataset.create_tuple_iterator(num_epochs=epoch)
for i in range(epoch):
    print("epoch: ", i)
    for item in iterator:
        print("item:\n", item[0])
epoch:  0
item:
 [[1 2]
 [3 4]]
item:
 [[5 6]
 [7 8]]
epoch:  1
item:
 [[1 2]
 [3 4]]
item:
 [[5 6]
 [7 8]]

The default output data type of the iterator is mindspore.Tensor. To get data of the type numpy.ndarray, set the parameter output_numpy=True.

[4]:
# The default output type is mindspore.Tensor
for item in dataset.create_tuple_iterator():
    print("dtype: ", type(item[0]), "\nitem:", item[0])

# Set the output type to numpy.ndarray
for item in dataset.create_tuple_iterator(output_numpy=True):
    print("dtype: ", type(item[0]), "\nitem:", item[0])
dtype:  <class 'mindspore.common.tensor.Tensor'>
item: [[1 2]
 [3 4]]
dtype:  <class 'mindspore.common.tensor.Tensor'>
item: [[5 6]
 [7 8]]
dtype:  <class 'numpy.ndarray'>
item: [[1 2]
 [3 4]]
dtype:  <class 'numpy.ndarray'>
item: [[5 6]
 [7 8]]

For more detailed instructions, please refer to create_tuple_iterator and create_dict_iterator API documentation.

Pass in the Model interface for iterative training or inference

After the dataset object is created, it can be passed into the Model interface, iterate data inside the interface, and send it to the network for training or inference.

[5]:
import numpy as np
from mindspore import ms_function
from mindspore import context, nn, Model
import mindspore.dataset as ds
import mindspore.ops as ops


def create_dataset():
    np_data = [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]
    np_data = np.array(np_data, dtype=np.float16)
    dataset = ds.NumpySlicesDataset(np_data, column_names=["data"], shuffle=False)
    return dataset


class Net(nn.Cell):
    def __init__(self):
        super(Net, self).__init__()
        self.relu = ops.ReLU()
        self.print = ops.Print()

    @ms_function
    def construct(self, x):
        self.print(x)
        return self.relu(x)


if __name__ == "__main__":
    # it is supported to run in CPU, GPU or Ascend
    context.set_context(mode=context.GRAPH_MODE)
    dataset = create_dataset()
    network = Net()
    model = Model(network)

    # do training, sink to device defaultly
    model.train(epoch=1, train_dataset=dataset, dataset_sink_mode=True)
Tensor(shape=[2, 2], dtype=Float16, value=
[[ 1.0000e+00  2.0000e+00]
 [ 3.0000e+00  4.0000e+00]])
Tensor(shape=[2, 2], dtype=Float16, value=
[[ 5.0000e+00  6.0000e+00]
 [ 7.0000e+00  8.0000e+00]])

The dataset_sink_mode parameter in the Model interface is used to set whether to sink data to the Device. If it is set to not sink, the above iterator will be created internally to traverse the data one by one and sent to the network; if set to sink, the data will be sent directly to the Device internally and sent to the network for iterative training or inference.

For more detailed usage, please refer to Model Basic Usage.