Functional and Cell
Operator Functional Interface
The MindSpore framework provides a rich set of Functional interfaces, defined under mindspore.ops
, which define operations or calculations directly as functions, without the need to explicitly create an instance of the operator class. The Functional interface provides interfaces including neural network layer functions, mathematical operations, Tensor operations, Parameter operations, differentiation functions, debugging functions, and other types of interfaces, which can be used directly in the construct
method of Cell
, or as standalone operations in data processing or model training.
The flow of MindSpore using the Functional interface in Cell
is shown below:
import mindspore as ms
import mindspore.nn as nn
import mindspore.ops as ops
class MyCell(nn.Cell):
def construct(self, x, y):
output = ops.add(x, y)
return output
net = MyCell()
x = ms.Tensor([1, 2, 3], ms.float32)
y = ms.Tensor([4, 5, 6], ms.float32)
output = net(x, y)
print(output)
The result is as follows:
[5. 7. 9.]
The flow of MindSpore standalone use of the Functional interface is shown below:
import mindspore as ms
import mindspore.ops as ops
x = ms.Tensor([1, 2, 3], ms.float32)
y = ms.Tensor([4, 5, 6], ms.float32)
output = ops.add(x, y)
print(output)
The result is as follows:
[5. 7. 9.]
Basic Network Elements Cell
The core element of the MindSpore framework, mindspore.nn.Cell
, is the basic module for building neural networks and is responsible for defining the computational logic of the network. Cell
not only supports dynamic graphs (PyNative mode) as a base component of a network, but can also be compiled for efficient computational graph execution in static graphs (GRAPH mode). Cell
defines the computational process of forward propagation through its construct
method, and can be extended through inheritance mechanisms to implement customized network layers or complex structures. Through the set_train
method, Cell
is able to flexibly switch between training and inference modes to accommodate the behavior difference of different operators in the two modes. In addition, Cell
provides rich APIs, such as mixed accuracy, parameter management, gradient setting, Hook function, recomputation, to support model optimization and training.
The basic Cell
build process for MindSpore is shown below:
import mindspore.nn as nn
import mindspore.ops as ops
class MyCell(nn.Cell):
def __init__(self, forward_net):
super(MyCell, self).__init__(auto_prefix=True)
self.net = forward_net
self.relu = ops.ReLU()
def construct(self, x):
y = self.net(x)
return self.relu(y)
inner_net = nn.Conv2d(120, 240, 4, has_bias=False)
my_net = MyCell(inner_net)
print(my_net.trainable_params())
The result is as follows:
[Parameter (name=net.weight, shape=(240, 120, 4, 4), dtype=Float32, requires_grad=True)]
In MindSpore, the name of the parameter is generally composed based on the name of the object defined in __init__
and the name used in the definition of the parameter, e.g., in the example above, the parameter for the convolution is net.weight
, where net
is the name of the object in self.net = forward_net
, and weight
is the name
in Conv2d when defining the parameter for the convolution: self.weight = Parameter(initializer(self.weight_init, shape), name='weight')
.
Parameter Management
MindSpore has two kinds of data objects: Tensor
and Parameter
. Tensor
object is only involved in the computation and does not need to perform gradient derivation and parameter update, while the Parameter
is passed into the optimizer based on its attribute requires_grad
.
Parameter Obtaining
mindspore.nn.Cell
uses the parameters_dict
, get_parameters
, and trainable_params
interfaces to get the Parameter
in Cell
.
parameters_dict: Get all Parameters in the network structure, returning
OrderedDict
with key as the Parameter name and value as the Parameter value.get_parameters: Get all Parameters in the network structure, returning an iterator over
Parameter
inCell
.trainable_params: Get the properties of
Parameter
whererequires_grad
isTrue
, returning a list of trainable Parameters.
When defining the optimizer, use net.trainable_params()
to get the list of Parameters that need to be updated.
import mindspore.nn as nn
net = nn.Dense(2, 1, has_bias=True)
print(net.trainable_params())
for param in net.trainable_params():
param_name = param.name
if "bias" in param_name:
param.requires_grad = False
print(net.trainable_params())
The result is as follows:
[Parameter (name=weight, shape=(1, 2), dtype=Float32, requires_grad=True), Parameter (name=bias, shape=(1,), dtype=Float32, requires_grad=True)]
[Parameter (name=weight, shape=(1, 2), dtype=Float32, requires_grad=True)]
Parameter Saving and Loading
MindSpore provides load_checkpoint
and save_checkpoint
methods for Parameter saving and loading. It should be noted that when Parameter is saved, the Parameter list is saved, and when Parameter is loaded, the object must be a Cell.
When Parameter is loaded, it is possible that the Parameter name is not correct and some modification is needed, you can directly construct a new Parameter list to load_checkpoint
to load into Cell.
import mindspore as ms
import mindspore.ops as ops
import mindspore.nn as nn
net = nn.Dense(2, 1, has_bias=True)
for param in net.get_parameters():
print(param.name, param.data.asnumpy())
ms.save_checkpoint(net, "dense.ckpt")
dense_params = ms.load_checkpoint("dense.ckpt")
print(dense_params)
new_params = {}
for param_name in dense_params:
print(param_name, dense_params[param_name].data.asnumpy())
new_params[param_name] = ms.Parameter(ops.ones_like(dense_params[param_name].data), name=param_name)
ms.load_param_into_net(net, new_params)
for param in net.get_parameters():
print(param.name, param.data.asnumpy())
The result is as follows:
weight [[-0.0042482 -0.00427286]]
bias [0.]
{'weight': Parameter (name=weight, shape=(1, 2), dtype=Float32, requires_grad=True), 'bias': Parameter (name=bias, shape=(1,), dtype=Float32, requires_grad=True)}
weight [[-0.0042482 -0.00427286]]
bias [0.]
weight [[1. 1.]]
bias [1.]
Submodule Management
Other Cell instances may be defined as submodules in mindspore.nn.Cell
. These submodules are integral parts of the network and may themselves contain learnable Parameters (e.g., weights and biases for convolutional layers) and other submodules. This hierarchical module structure allows users to build complex and reusable neural network architectures.
mindspore.nn.Cell
provides interfaces such as cells_and_names
, insert_child_to_cell
, and so on to realize submodule management functions.
from mindspore import nn
class MyCell(nn.Cell):
def __init__(self):
super(MyCell, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
# Add a submodule using insert_child_to_cell
self.insert_child_to_cell('conv3', nn.Conv2d(64, 128, 3, 1))
self.sequential_block = nn.SequentialCell(
nn.ReLU(),
nn.Conv2d(128, 256, 3, 1),
nn.ReLU()
)
def construct(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = self.conv3(x)
x = self.sequential_block(x)
return x
module = MyCell()
# Iterate over all submodules (both direct and indirect) using cells_and_names
for name, cell_instance in module.cells_and_names():
print(f"Cell name: {name}, type: {type(cell_instance)}")
The result is as follows:
Cell name: , type: <class '__main__.MyCell'>
Cell name: conv1, type: <class 'mindspore.nn.layer.conv.Conv2d'>
Cell name: conv2, type: <class 'mindspore.nn.layer.conv.Conv2d'>
Cell name: conv3, type: <class 'mindspore.nn.layer.conv.Conv2d'>
Cell name: sequential_block, type: <class 'mindspore.nn.layer.container.SequentialCell'>
Cell name: sequential_block.0, type: <class 'mindspore.nn.layer.activation.ReLU'>
Cell name: sequential_block.1, type: <class 'mindspore.nn.layer.conv.Conv2d'>
Cell name: sequential_block.2, type: <class 'mindspore.nn.layer.activation.ReLU'>
Hook Function
Debugging deep learning network is a task that every practitioner in the field of deep learning needs to face and invest a lot of effort. Since the deep learning network hides the input and output data and the inverse gradient of the intermediate layer operator and only provides the gradient of the network input data (feature quantity, weight), it leads to the inability to accurately perceive the data changes of the intermediate layer operator, which reduces the debugging efficiency. In order to facilitate users to debug the deep learning network accurately and quickly, MindSpore designs the Hook function in the dynamic graph mode. Using hook function can capture the input and output data of the middle layer operator and the inverse gradient.
Currently, MindSpore.nn.Cell
provides four forms of Hook function in dynamic graph mode, namely: register_forward_pre_hook
, register_forward_hook
, register_backward_hook
and register_backward_pre_hook
functions. See Hook Programming for details.
Recomputation
MindSpore uses automatic differentiation in reverse mode to automatically derive the reverse graph based on the forward graph computation flow, and the forward and reverse graphs together form a complete computation graphs. When computing some inverse operators, the results of some forward operators need to be used, resulting in the need for the results of these forward operators to reside in memory until the inverse operators that depend on them have been computed, and the memory occupied by the results of these forward operators will not be reused. This phenomenon pushes up the memory spikes for training, and is particularly significant in large-scale network models.
To solve this problem, the mindspore.nn.Cell.recompute
interface provides a recomputation function. The recomputation function allows the results of the forward operator to not be saved, allowing the memory to be reused, and then recomputing the forward operator if the forward result is needed when computing the reverse operator. See recomputation for details.