Advanced Usage of Training

Overview

MindSpore provides a large number of network models such as object detection and natural language processing in ModelZoo for users to directly use. However, some senior users may want to design networks or customize training cycles. The following describes how to customize a training network, how to customize a training cycle, and how to conduct inference while training. In addition, the on-device execution mode is also described in detail.

Note: This document is applicable to GPU and Ascend environments.

Customizing a Training Network

Before customizing a training network, you need to understand the network support of MindSpore, constraints on network construction using Python, and operator support.

Network support: Currently, MindSpore supports multiple types of networks, including computer vision, natural language processing, recommender, and graph neural network. For details, see Network List. If the existing networks cannot meet your requirements, you can define your own network as required.
Constraints on network construction using Python: MindSpore does not support the conversion of any Python source code into computational graphs. Therefore, the source code has the syntax and network definition constraints. For details, please refer to Static Graph Syntax Support. These constraints may change as MindSpore evolves.
Operator support: As the name implies, the network is based on operators. Therefore, before customizing a training network, you need to understand the operators supported by MindSpore. For details about operator implementation on different backends (Ascend, GPU, and CPU), see Operator List.

When the built-in operators of the network cannot meet the requirements, you can refer to Custom Operators(Ascend) to quickly expand the custom operators of the Ascend AI processor.

The following is a code example:

import numpy as np

from mindspore import Tensor
from mindspore.nn import Cell, Dense, SoftmaxCrossEntropyWithLogits, Momentum, TrainOneStepCell, WithLossCell
import mindspore.ops as ops


class ReLUReduceMeanDense(Cell):
    def __init__(self, kernel, bias, in_channel, num_class):
        super().__init__()
        self.relu = ops.ReLU()
        self.mean = ops.ReduceMean(keep_dims=False)
        self.dense = Dense(in_channel, num_class, kernel, bias)

    def construct(self, x):
        x = self.relu(x)
        x = self.mean(x, (2, 3))
        x = self.dense(x)
        return x


if __name__ == "__main__":
    weight_np = np.ones((1000, 2048)).astype(np.float32)
    weight = Tensor(weight_np.copy())
    bias_np = np.ones((1000,)).astype(np.float32)
    bias = Tensor(bias_np.copy())
    net = ReLUReduceMeanDense(weight, bias, 2048, 1000)
    criterion = SoftmaxCrossEntropyWithLogits(sparse=False)
    optimizer = Momentum(learning_rate=0.1, momentum=0.1,
                         params=filter(lambda x: x.requires_grad, net.get_parameters()))
    net_with_criterion = WithLossCell(net, criterion)
    train_network = TrainOneStepCell(net_with_criterion, optimizer)
    train_network.set_train()
    input_np = np.random.randn(32, 2048, 7, 7).astype(np.float32)
    input = Tensor(input_np.copy())
    label_np_onehot = np.zeros(shape=(32, 1000)).astype(np.float32)
    label = Tensor(label_np_onehot.copy())
    for i in range(1):
        loss = train_network(input, label)
        print("-------loss------", loss)

The output is as follows:

-------loss------ [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0.]

Customizing a Training Cycle

Before performing a custom training cycle, download the MNIST dataset that needs to be used, and decompress and place it at the specified location, execute the following command in jupyter notebook:

mkdir -p ./datasets/MNIST_Data/train ./datasets/MNIST_Data/test
wget -NP ./datasets/MNIST_Data/train https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/train-labels-idx1-ubyte --no-check-certificate
wget -NP ./datasets/MNIST_Data/train https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/train-images-idx3-ubyte --no-check-certificate
wget -NP ./datasets/MNIST_Data/test https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/t10k-labels-idx1-ubyte --no-check-certificate
wget -NP ./datasets/MNIST_Data/test https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/t10k-images-idx3-ubyte --no-check-certificate
tree ./datasets/MNIST_Data

./datasets/MNIST_Data
├── test
│   ├── t10k-images-idx3-ubyte
│   └── t10k-labels-idx1-ubyte
└── train
    ├── train-images-idx3-ubyte
    └── train-labels-idx1-ubyte

2 directories, 4 files

If you do not want to use the Model interface provided by MindSpore, you can also refer to the following examples to freely control the number of iterations, traverse the dataset, and so on.

The following is a code example:

import os

import mindspore.dataset as ds
import mindspore.dataset.transforms.c_transforms as CT
import mindspore.dataset.vision.c_transforms as CV
import mindspore.nn as nn
from mindspore import context, DatasetHelper, connect_network_with_dataset
from mindspore import dtype as mstype
from mindspore.common.initializer import TruncatedNormal
from mindspore import ParameterTuple
from mindspore.dataset.vision import Inter
from mindspore.nn import WithLossCell
import mindspore.ops as ops


def create_dataset(data_path, batch_size=32, repeat_size=1,
                   num_parallel_workers=1):
    """
    create dataset for train or test
    """
    # define dataset
    mnist_ds = ds.MnistDataset(data_path)

    resize_height, resize_width = 32, 32
    rescale = 1.0 / 255.0
    shift = 0.0
    rescale_nml = 1 / 0.3081
    shift_nml = -1 * 0.1307 / 0.3081

    # define map operations
    resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR)  # Bilinear mode
    rescale_nml_op = CV.Rescale(rescale_nml, shift_nml)
    rescale_op = CV.Rescale(rescale, shift)
    hwc2chw_op = CV.HWC2CHW()
    type_cast_op = CT.TypeCast(mstype.int32)

    # apply map operations on images
    mnist_ds = mnist_ds.map(input_columns="label", operations=type_cast_op, num_parallel_workers=num_parallel_workers)
    mnist_ds = mnist_ds.map(input_columns="image", operations=resize_op, num_parallel_workers=num_parallel_workers)
    mnist_ds = mnist_ds.map(input_columns="image", operations=rescale_op, num_parallel_workers=num_parallel_workers)
    mnist_ds = mnist_ds.map(input_columns="image", operations=rescale_nml_op, num_parallel_workers=num_parallel_workers)
    mnist_ds = mnist_ds.map(input_columns="image", operations=hwc2chw_op, num_parallel_workers=num_parallel_workers)

    # apply DatasetOps
    buffer_size = 10000
    mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size)  # 10000 as in LeNet train script
    mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True)
    mnist_ds = mnist_ds.repeat(repeat_size)

    return mnist_ds


def conv(in_channels, out_channels, kernel_size, stride=1, padding=0):
    """weight initial for conv layer"""
    weight = weight_variable()
    return nn.Conv2d(in_channels, out_channels,
                     kernel_size=kernel_size, stride=stride, padding=padding,
                     weight_init=weight, has_bias=False, pad_mode="valid")


def fc_with_initialize(input_channels, out_channels):
    """weight initial for fc layer"""
    weight = weight_variable()
    bias = weight_variable()
    return nn.Dense(input_channels, out_channels, weight, bias)


def weight_variable():
    """weight initial"""
    return TruncatedNormal(0.02)


class LeNet5(nn.Cell):
    """
    Lenet network
    Args:
        num_class (int): Num classes. Default: 10.

    Returns:
        Tensor, output tensor

    Examples:
        >>> LeNet(num_class=10)
    """

    def __init__(self, num_class=10):
        super(LeNet5, self).__init__()
        self.num_class = num_class
        self.batch_size = 32
        self.conv1 = conv(1, 6, 5)
        self.conv2 = conv(6, 16, 5)
        self.fc1 = fc_with_initialize(16 * 5 * 5, 120)
        self.fc2 = fc_with_initialize(120, 84)
        self.fc3 = fc_with_initialize(84, self.num_class)
        self.relu = nn.ReLU()
        self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
        self.reshape = ops.Reshape()

    def construct(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.max_pool2d(x)
        x = self.conv2(x)
        x = self.relu(x)
        x = self.max_pool2d(x)
        x = self.reshape(x, (self.batch_size, -1))
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        return x


class TrainOneStepCell(nn.Cell):
    def __init__(self, network, optimizer):
        super(TrainOneStepCell, self).__init__(auto_prefix=False)
        self.network = network
        self.weights = ParameterTuple(network.trainable_params())
        self.optimizer = optimizer
        self.grad = ops.GradOperation(get_by_list=True)

    def construct(self, data, label):
        weights = self.weights
        loss = self.network(data, label)
        grads = self.grad(self.network, weights)(data, label)
        return ops.depend(loss, self.optimizer(grads))


if __name__ == "__main__":
    context.set_context(mode=context.GRAPH_MODE, device_target="GPU")

    ds_data_path = "./datasets/MNIST_Data/train/"
    ds_train = create_dataset(ds_data_path, 32)

    network = LeNet5(10)
    net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")
    net_opt = nn.Momentum(network.trainable_params(), 0.01, 0.9)
    net = WithLossCell(network, net_loss)
    net = TrainOneStepCell(net, net_opt)
    network.set_train()
    print("============== Starting Training ==============")
    epochs = 10
    for epoch in range(epochs):
        for inputs in ds_train:
            output = net(*inputs)
            print("epoch: {0}/{1}, losses: {2}".format(epoch + 1, epochs, output.asnumpy(), flush=True))

The output is as follows:

============== Starting Training ==============
epoch: 1/10, losses: 2.3086986541748047
epoch: 1/10, losses: 2.309938430786133
epoch: 1/10, losses: 2.302298069000244
epoch: 1/10, losses: 2.310209035873413
epoch: 1/10, losses: 2.3002336025238037
epoch: 1/10, losses: 2.3022992610931396
... ...
epoch: 1/10, losses: 0.18848800659179688
epoch: 1/10, losses: 0.15532201528549194
epoch: 2/10, losses: 0.179201140999794
epoch: 2/10, losses: 0.20995387434959412
epoch: 2/10, losses: 0.4867479205131531
... ...
epoch: 10/10, losses: 0.027243722230196
epoch: 10/10, losses: 0.07665436714887619
epoch: 10/10, losses: 0.005962767638266087
epoch: 10/10, losses: 0.026364721357822418
epoch: 10/10, losses: 0.0003102901973761618

For details about how to obtain the MNIST dataset used in the example, see Downloading the Dataset. The typical application scenario is gradient accumulation. For details, see Applying Gradient Accumulation Algorithm.

Conducting Inference While Training

For some complex networks with a large data volume and a relatively long training time, to learn the change of model accuracy in different training phases, the model accuracy may be traced in a manner of inference while training. For details, see Evaluating the Model during Training.

On-Device Execution

Currently, the backends supported by MindSpore include Ascend, GPU, and CPU. The device in the “On-Device” refers to the Ascend AI processor.

The Ascend AI processor integrates the AI core, AI CPU, and CPU. The AI core is responsible for large Tensor Vector computing, the AI CPU is responsible for scalar computing, and the CPU is responsible for logic control and task distribution.

The CPU on the host side delivers graphs or operators to the Ascend AI processor. The Ascend AI processor has the functions of computing, logic control, and task distribution. Therefore, it does not need to frequently interact with the CPU on the host side. It only needs to return the final calculation result to the host. In this way, the entire graph is sunk to the device for execution, avoiding frequent interaction between the host and device and reducing overheads.

Computational Graphs on Devices

The entire graph is executed on the device to reduce the interaction overheads between the host and device. Multiple steps can be moved downwards together with cyclic sinking to further reduce the number of interactions between the host and device.

Cyclic sinking is optimized based on on-device execution to further reduce the number of interactions between the host and device. Generally, each step returns a result. Cyclic sinking is used to control the number of steps at which a result is returned.

By default, the result is returned for each epoch. In this way, the host and device need to exchange data only once in each epoch.

You can also use dataset_sink_mode and sink_size of the train interface to control the sunk data volume of each epoch.

Data Sinking

The train interface parameter dataset_sink_mode of Model can be used to control whether data sinks. If the value of dataset_sink_mode is True, data sinking is enabled. Otherwise, data sinking is disabled. Sinking means that data is directly transmitted to the device through a channel.

The dataset_sink_mode parameter can be used with sink_size to control the amount of data sunk by each epoch. When dataset_sink_mode is set to True, that is, the data sinking mode is used:

If sink_size is set to the default value –1, the amount of data sunk by each epoch is the size of the original entire dataset.

If sink_size is greater than 0, the raw dataset can be traversed for an unlimited number of times. Each epoch sinks the data volume of sink_size, and the next epoch continues to traverse from the end position of the previous traversal.

The total sunk data volume is controlled by the epoch and sink_size variables. That is, the total data volume is calculated as follows: Total data volume = epoch x sink_size.

When using LossMonitor, TimeMonitor or other Callback interfaces, if the dateset_sink_mode is set to False, each step between the Host side and the Device side interacts once, so each step will return a result. If dataset_sink_mode is True, because the data is transmitted through the channel on the Device, there is one data interaction between the Host side and the Device side for each epoch, so each epoch only returns one result.

The CPU and pynative mode cannot support dataset sink mode currently.

The following is a code example:

import os

import mindspore.dataset as ds
import mindspore.dataset.transforms.c_transforms as CT
import mindspore.dataset.vision.c_transforms as CV
import mindspore.nn as nn
from mindspore import context, Model
from mindspore import dtype as mstype
from mindspore.common.initializer import TruncatedNormal
from mindspore.dataset.vision import Inter
from mindspore.nn import Accuracy
import mindspore.ops as ops
from mindspore.train.callback import LossMonitor


def create_dataset(data_path, batch_size=32, repeat_size=1,
                   num_parallel_workers=1):
    """
    create dataset for train or test
    """
    # define dataset
    mnist_ds = ds.MnistDataset(data_path)

    resize_height, resize_width = 32, 32
    rescale = 1.0 / 255.0
    shift = 0.0
    rescale_nml = 1 / 0.3081
    shift_nml = -1 * 0.1307 / 0.3081

    # define map operations
    resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR)  # Bilinear mode
    rescale_nml_op = CV.Rescale(rescale_nml, shift_nml)
    rescale_op = CV.Rescale(rescale, shift)
    hwc2chw_op = CV.HWC2CHW()
    type_cast_op = CT.TypeCast(mstype.int32)

    # apply map operations on images
    mnist_ds = mnist_ds.map(input_columns="label", operations=type_cast_op, num_parallel_workers=num_parallel_workers)
    mnist_ds = mnist_ds.map(input_columns="image", operations=resize_op, num_parallel_workers=num_parallel_workers)
    mnist_ds = mnist_ds.map(input_columns="image", operations=rescale_op, num_parallel_workers=num_parallel_workers)
    mnist_ds = mnist_ds.map(input_columns="image", operations=rescale_nml_op, num_parallel_workers=num_parallel_workers)
    mnist_ds = mnist_ds.map(input_columns="image", operations=hwc2chw_op, num_parallel_workers=num_parallel_workers)

    # apply DatasetOps
    buffer_size = 10000
    mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size)  # 10000 as in LeNet train script
    mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True)
    mnist_ds = mnist_ds.repeat(repeat_size)

    return mnist_ds


def conv(in_channels, out_channels, kernel_size, stride=1, padding=0):
    """weight initial for conv layer"""
    weight = weight_variable()
    return nn.Conv2d(in_channels, out_channels,
                     kernel_size=kernel_size, stride=stride, padding=padding,
                     weight_init=weight, has_bias=False, pad_mode="valid")


def fc_with_initialize(input_channels, out_channels):
    """weight initial for fc layer"""
    weight = weight_variable()
    bias = weight_variable()
    return nn.Dense(input_channels, out_channels, weight, bias)


def weight_variable():
    """weight initial"""
    return TruncatedNormal(0.02)


class LeNet5(nn.Cell):
    """
    Lenet network
    Args:
        num_class (int): Num classes. Default: 10.

    Returns:
        Tensor, output tensor

    Examples:
        >>> LeNet(num_class=10)
    """

    def __init__(self, num_class=10):
        super(LeNet5, self).__init__()
        self.num_class = num_class
        self.batch_size = 32
        self.conv1 = conv(1, 6, 5)
        self.conv2 = conv(6, 16, 5)
        self.fc1 = fc_with_initialize(16 * 5 * 5, 120)
        self.fc2 = fc_with_initialize(120, 84)
        self.fc3 = fc_with_initialize(84, self.num_class)
        self.relu = nn.ReLU()
        self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
        self.reshape = ops.Reshape()

    def construct(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.max_pool2d(x)
        x = self.conv2(x)
        x = self.relu(x)
        x = self.max_pool2d(x)
        x = self.reshape(x, (self.batch_size, -1))
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        return x


if __name__ == "__main__":
    context.set_context(mode=context.GRAPH_MODE, device_target="GPU")
    ds_train_path = "./datasets/MNIST_Data/train/"
    ds_train = create_dataset(ds_train_path, 32)

    network = LeNet5(10)
    net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")
    net_opt = nn.Momentum(network.trainable_params(), 0.01, 0.9)
    model = Model(network, net_loss, net_opt)

    print("============== Starting Training ==============")
    model.train(epoch=10, train_dataset=ds_train, callbacks=[LossMonitor()], dataset_sink_mode=True, sink_size=1000)

The output is as follows:

============== Starting Training ==============
epoch: 1 step: 1000, loss is 0.110185064
epoch: 2 step: 1000, loss is 0.12088283
epoch: 3 step: 1000, loss is 0.15903473
epoch: 4 step: 1000, loss is 0.030054657
epoch: 5 step: 1000, loss is 0.013846226
epoch: 6 step: 1000, loss is 0.052161213
epoch: 7 step: 1000, loss is 0.0050197737
epoch: 8 step: 1000, loss is 0.17207858
epoch: 9 step: 1000, loss is 0.010310417
epoch: 10 step: 1000, loss is 0.000672762

When batch_size is 32, the size of the dataset is 1875. When sink_size is set to 1000, each epoch sinks 1000 batches of data, the number of sinks is epoch (=10), and the total sunk data volume is epoch x sink_size = 10000.

dataset_sink_mode is True, so every epoch returns a result.

When dataset_sink_mode is set to False, the sink_size parameter is invalid.