Optimizing Model Parameters

After learning how to create a model and build a dataset in the preceding tutorials, you can start to learn how to set hyperparameters and optimize model parameters.

Hyperparameters

Hyperparameters can be adjusted to control the model training and optimization process. Different hyperparameter values may affect the model training and convergence speed.

Generally, the following hyperparameters are defined for training:

Epoch: specifies number of times that the dataset is traversed during training.
Batch size: specifies the size of each batch of data to be read.
Learning rate: If the learning rate is low, the convergence speed slows down. If the learning rate is high, unpredictable results such as no training convergence may occur.

epochs = 5
batch_size = 64
learning_rate = 1e-3

Loss Functions

The loss function is used to evaluate the difference between predicted value and actual value of a model. Here, the absolute error loss function L1Loss is used. mindspore.nn.loss provides many common loss functions, such as SoftmaxCrossEntropyWithLogits, MSELoss, and SmoothL1Loss.

The output value and target value are provided to compute the loss value. The method is as follows:

import numpy as np
import mindspore.nn as nn
from mindspore import Tensor

loss = nn.L1Loss()
output_data = Tensor(np.array([[1, 2, 3], [2, 3, 4]]).astype(np.float32))
target_data = Tensor(np.array([[0, 2, 5], [3, 1, 1]]).astype(np.float32))
print(loss(output_data, target_data))

1.5

Optimizer

An optimizer is used to compute and update the gradient. The selection of the model optimization algorithm directly affects the performance of the final model. A poor effect may be caused by the optimization algorithm instead of the feature or model design. All optimization logic of MindSpore is encapsulated in the Optimizer object. Here, the SGD optimizer is used. mindspore.nn.optim provides many common optimizers, such as ADAM and Momentum.

To use mindspore.nn.optim, you need to build an Optimizer object. This object can retain the current parameter status and update parameters based on the computed gradient.

To build an Optimizer, we need to provide an iterator that contains parameters (must be variable objects) to be optimized. For example, set params to net.trainable_params() for all parameter that can be trained on the network. Then, you can set the Optimizer parameter options, such as the learning rate and weight attenuation.

A code example is as follows:

from mindspore import nn

optim = nn.SGD(params=net.trainable_params(), learning_rate=0.1, weight_decay=0.0)

Training

A model training process is generally divided into four steps.

Define a neural network.
Build a dataset.
Define hyperparameters, a loss function, and an optimizer.
Enter the epoch and dataset for training.

The code example for model training is as follows:

import mindspore.dataset as ds
import mindspore.dataset.transforms.c_transforms as C
import mindspore.dataset.vision.c_transforms as CV
from mindspore import nn, Tensor, Model
from mindspore import dtype as mstype

DATA_DIR = "./datasets/cifar-10-batches-bin/train"

# Define a neural network.
class Net(nn.Cell):
    def __init__(self, num_class=10, num_channel=3):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid')
        self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid')
        self.fc1 = nn.Dense(16 * 5 * 5, 120)
        self.fc2 = nn.Dense(120, 84)
        self.fc3 = nn.Dense(84, num_class)
        self.relu = nn.ReLU()
        self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
        self.flatten = nn.Flatten()

    def construct(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.max_pool2d(x)
        x = self.conv2(x)
        x = self.relu(x)
        x = self.max_pool2d(x)
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        return x

net = Net()
epochs = 5
batch_size = 64
learning_rate = 1e-3

# Build a dataset.
sampler = ds.SequentialSampler(num_samples=128)
dataset = ds.Cifar10Dataset(DATA_DIR, sampler=sampler)

# Convert the data type.
type_cast_op_image = C.TypeCast(mstype.float32)
type_cast_op_label = C.TypeCast(mstype.int32)
HWC2CHW = CV.HWC2CHW()
dataset = dataset.map(operations=[type_cast_op_image, HWC2CHW], input_columns="image")
dataset = dataset.map(operations=type_cast_op_label, input_columns="label")
dataset = dataset.batch(batch_size)

# Define hyperparameters, a loss function, and an optimizer.
optim = nn.SGD(params=net.trainable_params(), learning_rate=learning_rate)
loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')

# Enter the epoch and dataset for training.
model = Model(net, loss_fn=loss, optimizer=optim)
model.train(epoch=epochs, train_dataset=dataset)