Training Process Overview
With the rapid development of artificial intelligence technology, deep learning has become a core technology in many fields. Deep learning model training is an important part of deep learning, which involves multiple stages and processes.
A complete training process typically consists of four steps, including dataset preprocessing, model creation, defining the loss function and optimizer, and training and saving the model. Normally, defining training and evaluation network and running it directly would be sufficient for basic needs, but when using MindSpore, the network can be encapsulated using Model
, which simplifies the code to a certain extent and allows easy access to higher-order features such as data sinking and mixed accuracy.
To build a complete training process using MindSpore, you can refer to the following guidelines:
High-order Encapsulation
For neural networks in simple scenarios, the forward network network
, the loss function loss_fn
, the optimizer optimizer
and the evaluation function metrics
can be specified when defining the Model
.
Building a neural network using Model
is generally divided into four steps as follows:
Dataset Preprocessing: Use
mindspore.dataset
to load a dataset and then perform operations such as scaling, normalizing, formatting on the dataset. Refer to the Dataset Preprocessing tutorial for more information.Model Building: Build the neural network using
mindspore.nn.Cell
, initialize the neural network layers within \__init__
, and construct the neural network forward execution logic within theconstruct
function. Refer to the Model Creation tutorial for more information.Defining Loss Function and Optimizer: Define the loss function and optimizer function using
mindspore.nn
. Refer to Define loss function and optimizer tutorial for more information.Training and Saving Models: Use the
ModelCheckpoint
interface to save the network model and parameters,model.fit
for training and evaluating the network, andLossMonitor
to monitor changes inloss
during training. Refer to Training and saving models tutorial for more information.
In addition, during the deep learning training process, we can use the callback mechanism to keep track of the training status of the network model, observe the changes of the parameters of the network model in real time, and implement some user-defined operations during the training process. In addition to MindSpore built-in callback functions, you can also customize your own callback class based on the Callback
base class.
When the training task is over, evaluation functions (Metrics) are often needed to assess how good the model is. Similarly, MindSpore provides evaluation functions for most common tasks, and users can customize Metrics functions by inheriting from the mindspore.train.Metric
base class.
Performance Optimization Approaches
In the MindSpore deep learning training process, we usually have the following three means to optimize the training performance:
Sinking Mode: MindSpore provides data graph sinking, graph sinking and loop sinking functions, which can greatly reduce the Host-Device interaction overhead and effectively improve the performance of training and inference. Refer to the Sinking Mode tutorial for more information.
Vectorization Acceleration Interface Vmap: The
vmap
interface converts highly-repetitive arithmetic logic in a model or function into parallel vector arithmetic logic, to gain leaner code logic and more efficient execution performance. Refer to the Vectorization Acceleration Interface Vmap tutorial for more information.
Advanced Differential Interfaces
MindSpore can use the ops.grad
or ops.value_and_grad
interfaces to perform automatic differentiation, and automatic first-order or higher-order derivation of the function or network to be derived.
Higher-Order Training Strategies
MindSpore can perform some higher-order training strategies, such as:
Gradient Accumulation: Gradient Accumulation is a way to split the data samples for training a neural network into several small Batches according to the Batch size, and then calculate them sequentially. The purpose is to solve the OOM (Out Of Memory) problem that the neural network can not be trained due to the lack of memory, resulting in the Batch size is too large, or the network model is too large to be loaded. Refer to the Gradient Accumulation tutorial for more information.
Per-sample-gradients: per-sample-gradients can help us to better improve the training of the model by more accurately calculating the effect of each sample on the network parameters when training the neural network, and MindSpore provides efficient methods to calculate per-sample-gradients. Refer to the Per-sample-gradients tutorial for more information.