Parameter Initialization
Initializing with Built-In Parameters
MindSpore provides a variety of network parameter initialization methods, and encapsulates the function of parameter initialization in some operators. This section takes Conv2d
as an example to introduce how to use the subclass, Initializer
, and string to initialize parameters.
Initializer Initialization
Initializer
is the built-in parameter initialization base class of MindSpore. All built-in parameter initialization methods inherit this class. The neural network layer package in mindspore.nn
provides input parameters weight_init
, bias_init
, etc., which can be directly initialized with the instantiated Initializer. Examples are as follows:
import numpy as np
import mindspore.nn as nn
import mindspore as ms
from mindspore.common.initializer import Normal, initializer
input_data = ms.Tensor(np.ones([1, 3, 16, 50], dtype=np.float32))
# Convolution layer, the input channel is 3, the output channel is 64, the size of convolution kernel is 3 * 3, and the weight parameter uses the random number generated by normal distribution, Nomal().
net = nn.Conv2d(3, 64, 3, weight_init=Normal(0.2))
# The network output
output = net(input_data)
String Initialization
In addition to using the instantiated Initializer, MindSpore also provides a simple method for parameter initialization, that is, using the string of initializing method name. This method uses the default parameters of the Initializer to initialize. Examples are as follows:
import numpy as np
import mindspore.nn as nn
import mindspore as ms
net = nn.Conv2d(3, 64, 3, weight_init='normal')
output = net(input_data)
Customized Parameter Initialization
In general, the default parameter initialization provided by MindSpore can meet the initialization requirements of the common neural network layer. When encountering a parameter initialization method that needs to be customized, you can inherit the Initializer
custom parameter initialization method. Take XavierNormal
as an example:
import math
import numpy as np
from mindspore.common.initializer import Initializer
def _calculate_fan_in_and_fan_out(arr):
# calculate fan_in and fan_out. fan_in is the number of input units in `arr` , and fan_out is the number of output units in `arr`.
shape = arr.shape
dimensions = len(shape)
if dimensions < 2:
raise ValueError("'fan_in' and 'fan_out' can not be computed for arr with fewer than"
" 2 dimensions, but got dimensions {}.".format(dimensions))
if dimensions == 2: # Linear
fan_in = shape[1]
fan_out = shape[0]
else:
num_input_fmaps = shape[1]
num_output_fmaps = shape[0]
receptive_field_size = 1
for i in range(2, dimensions):
receptive_field_size *= shape[i]
fan_in = num_input_fmaps * receptive_field_size
fan_out = num_output_fmaps * receptive_field_size
return fan_in, fan_out
class XavierNormal(Initializer):
def __init__(self, gain=1):
super().__init__()
# Configure the parameters required for initialization
self.gain = gain
def _initialize(self, arr): # arr is a Tensor to be initialized
fan_in, fan_out = _calculate_fan_in_and_fan_out(arr) # Compute fan_in, fan_out
std = self.gain * math.sqrt(2.0 / float(fan_in + fan_out)) # Calculate std value
data = np.random.normal(0, std, arr.shape) # Construct the initialized array with numpy
arr[:] = data[:] # Assign the initialized ndarray to arr
After that, we can call it like the built-in initialization method:
net = nn.Conv2d(3, 64, 3, weight_init=XavierNormal())
# The network output
output = net(input_data)
Cell traversal initialization
In addition to using parameters weight_init
, bias_init
, etc., provided by mindspore.nn
, we are also used to constructing a complete neural network first, and then uniformly managing the weight
, bias
and other parameters. At this time, you need to construct a network and instantiate it, then traverse the cell and assign values to parameters. Here is a simple example:
for name, param in net.parameters_and_names():
if 'weight' in name:
param.set_data(initializer(Normal(), param.shape, param.dtype))
if 'bias' in name:
param.set_data(initializer('zeros', param.shape, param.dtype))