Updating Network Parameters
Overview
Parameter
is a variable tensor, indicating the parameters that need to be updated during network training. The following describes the Parameter
initialization, attributes, methods, and ParameterTuple
.
Initialization
mindspore.Parameter(default_input, name=None, requires_grad=True, layerwise_parallel=False)
Initialize a Parameter
object. The input data supports the Tensor
, Initializer
, int
, and float
types.
The initializer
API can be called to generate the Initializer
object.
When init
is used to initialize Tensor
, the Tensor
only stores the shape and type of the tensor, not the actual data. Therefore, Tensor
does not occupy any memory, you can call the init_data
API to convert Tensor
saved in Parameter
to the actual data.
You can specify a name for each Parameter
to facilitate subsequent operations and updates. It is recommended to use the default value of name
when initialize a parameter as one attribute of a cell, otherwise, the parameter name may be different than expected.
To update a parameter, set requires_grad
to True
.
When layerwise_parallel
is set to True, this parameter will be filtered out during parameter broadcast and parameter gradient aggregation.
For details about the configuration of distributed parallelism, see https://www.mindspore.cn/docs/programming_guide/en/r1.3/auto_parallel.html.
In the following example, Parameter
objects are built using three different data types. All the three Parameter
objects need to be updated, and layerwise parallelism is not used.
import numpy as np
from mindspore import Tensor, Parameter
from mindspore import dtype as mstype
from mindspore.common.initializer import initializer
x = Parameter(default_input=Tensor(np.arange(2*3).reshape((2, 3))), name='x')
y = Parameter(default_input=initializer('ones', [1, 2, 3], mstype.float32), name='y')
z = Parameter(default_input=2.0, name='z')
print(x, "\n\n", y, "\n\n", z)
The following information is displayed:
Parameter (name=x, shape=(2, 3), dtype=Int32, requires_grad=True)
Parameter (name=y, shape=(1, 2, 3), dtype=Float32, requires_grad=True)
Parameter (name=z, shape=(), dtype=Float32, requires_grad=True)
Attributes
inited_param
: returnsParameter
that stores the actual data.name
: specifies a name for an instantiatedParameter
.sliced
: specifies whether the data stored inParameter
is sharded data in the automatic parallel scenario.
If yes, do not shard the data. Otherwise, determine whether to shard the data based on the network parallel strategy.
is_init
: initialization status ofParameter
. At the GE backend, aninit graph
is required to synchronize data from the host to the device. This parameter specifies whether the data has been synchronized to the device. This parameter takes effect only at the GE backend. This parameter is set to False at other backends.layerwise_parallel
: specifies whetherParameter
supports layerwise parallelism. If yes, parameters are not broadcasted and gradient aggregation is not performed. Otherwise, parameters need to be broadcasted and gradient aggregation is performed.requires_grad
: specifies whether to compute the parameter gradient. If a parameter needs to be trained, the parameter gradient needs to be computed. Otherwise, the parameter gradient does not need to be computed.data
:Parameter
.
In the following example, Parameter
is initialized through Tensor
to obtain its attributes.
import numpy as np
from mindspore import Tensor, Parameter
x = Parameter(default_input=Tensor(np.arange(2*3).reshape((2, 3))))
print("name: ", x.name, "\n",
"sliced: ", x.sliced, "\n",
"is_init: ", x.is_init, "\n",
"inited_param: ", x.inited_param, "\n",
"requires_grad: ", x.requires_grad, "\n",
"layerwise_parallel: ", x.layerwise_parallel, "\n",
"data: ", x.data)
The following information is displayed:
name: Parameter
sliced: False
is_init: False
inited_param: None
requires_grad: True
layerwise_parallel: False
data: Parameter (name=Parameter, shape=(2, 3), dtype=Int64, requires_grad=True)
Methods
init_data
: When the network uses the semi-automatic or automatic parallel strategy, and the data input duringParameter
initialization isInitializer
, this API can be called to convert the data saved byParameter
toTensor
.set_data
: sets the data saved byParameter
.Tensor
,Initializer
,int
, andfloat
can be input for setting. When the input parameterslice_shape
of the method is set to True, the shape ofParameter
can be changed. Otherwise, the configured shape must be the same as the original shape ofParameter
.set_param_ps
: controls whether training parameters are trained by using the Parameter Server.clone
: clonesParameter
. You can specify the parameter name after cloning.
In the following example, Initializer
is used to initialize Tensor
, and methods related to Parameter
are called.
import numpy as np
from mindspore import Tensor, Parameter
from mindspore import dtype as mstype
from mindspore.common.initializer import initializer
x = Parameter(default_input=initializer('ones', [1, 2, 3], mstype.float32))
print(x)
x_clone = x.clone()
x_clone.name = "x_clone"
print(x_clone)
print(x.init_data())
print(x.set_data(data=Tensor(np.arange(2*3).reshape((1, 2, 3)))))
The following information is displayed:
Parameter (name=Parameter, shape=(1, 2, 3), dtype=Float32, requires_grad=True)
Parameter (name=x_clone, shape=(1, 2, 3), dtype=Float32, requires_grad=True)
Parameter (name=Parameter, shape=(1, 2, 3), dtype=Float32, requires_grad=True)
Parameter (name=Parameter, shape=(1, 2, 3), dtype=Float32, requires_grad=True)
ParameterTuple
Inherited from tuple
, ParameterTuple
is used to store multiple Parameter
objects. __new__(cls, iterable)
is used to transfer an iterator for storing Parameter
for building, and the clone
API is provided for cloning.
The following example builds a ParameterTuple
object and clones it.
import numpy as np
from mindspore import Tensor, Parameter, ParameterTuple
from mindspore import dtype as mstype
from mindspore.common.initializer import initializer
x = Parameter(default_input=Tensor(np.arange(2*3).reshape((2, 3))), name='x')
y = Parameter(default_input=initializer('ones', [1, 2, 3], mstype.float32), name='y')
z = Parameter(default_input=2.0, name='z')
params = ParameterTuple((x, y, z))
params_copy = params.clone("params_copy")
print(params, "\n")
print(params_copy)
The following information is displayed:
(Parameter (name=x, shape=(2, 3), dtype=Int32, requires_grad=True), Parameter (name=y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=z, shape=(), dtype=Float32, requires_grad=True))
(Parameter (name=params_copy.x, shape=(2, 3), dtype=Int32, requires_grad=True), Parameter (name=params_copy.y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=params_copy.z, shape=(), dtype=Float32, requires_grad=True))