Automatic Differentiation
Backward propagation is the commonly used algorithm for training neural networks. In this algorithm, parameters (model weights) are adjusted based on a gradient of a loss function for a given parameter.
The first-order derivative method of MindSpore is mindspore.ops.GradOperation (get_all=False, get_by_list=False, sens_param=False)
. When get_all
is set to False
, the first input derivative is computed. When get_all
is set to True
, all input derivatives are computed. When get_by_list
is set to False
, weight derivatives are not computed. When get_by_list
is set to True
, the weight derivative is computed. sens_param
scales the output value of the network to change the final gradient. The following uses the MatMul operator derivative for in-depth analysis.
Import the required modules and APIs:
import numpy as np
import mindspore.nn as nn
import mindspore.ops as ops
from mindspore import Tensor
from mindspore import ParameterTuple, Parameter
from mindspore import dtype as mstype
First-order Derivative of the Input
To compute the input derivative, you need to define a network requiring a derivative. The following uses a network
The network structure is as follows:
class Net(nn.Cell):
def __init__(self):
super(Net, self).__init__()
self.matmul = ops.MatMul()
self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z')
def construct(self, x, y):
x = x * self.z
out = self.matmul(x, y)
return out
Define the network requiring the derivative. In the __init__
function, define the self.net
and ops.GradOperation
networks. In the construct
function, compute the derivative of self.net
.
The network structure is as follows:
class GradNetWrtX(nn.Cell):
def __init__(self, net):
super(GradNetWrtX, self).__init__()
self.net = net
self.grad_op = ops.GradOperation()
def construct(self, x, y):
gradient_function = self.grad_op(self.net)
return gradient_function(x, y)
Define the input and display the output:
x = Tensor([[0.8, 0.6, 0.2], [1.8, 1.3, 1.1]], dtype=mstype.float32)
y = Tensor([[0.11, 3.3, 1.1], [1.1, 0.2, 1.4], [1.1, 2.2, 0.3]], dtype=mstype.float32)
output = GradNetWrtX(Net())(x, y)
print(output)
[[4.5099998 2.7 3.6000001]
[4.5099998 2.7 3.6000001]]
If the derivatives of the x
and y
inputs are considered, you only need to set self.grad_op = GradOperation(get_all=True)
in GradNetWrtX
.
First-order Derivative of the Weight
To compute weight derivatives, you need to set get_by_list
in ops.GradOperation
to True
.
The GradNetWrtX
structure is as follows:
class GradNetWrtX(nn.Cell):
def __init__(self, net):
super(GradNetWrtX, self).__init__()
self.net = net
self.params = ParameterTuple(net.trainable_params())
self.grad_op = ops.GradOperation(get_by_list=True)
def construct(self, x, y):
gradient_function = self.grad_op(self.net, self.params)
return gradient_function(x, y)
Run and display the output:
output = GradNetWrtX(Net())(x, y)
print(output)
(Tensor(shape=[1], dtype=Float32, value= [ 2.15359993e+01]),)
If computation of certain weight derivatives is not required, set requirements_grad
to False
when defining the network requiring derivatives.
self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z', requires_grad=False)
Gradient Value Scaling
You can use the sens_param
parameter to scale the output value of the network to change the final gradient. Set sens_param
in ops.GradOperation
to True
and determine the scaling index. The dimension must be the same as the output dimension.
The scaling index self.grad_wrt_output
may be in the following format:
self.grad_wrt_output = Tensor([[s1, s2, s3], [s4, s5, s6]])
The GradNetWrtX
structure is as follows:
class GradNetWrtX(nn.Cell):
def __init__(self, net):
super(GradNetWrtX, self).__init__()
self.net = net
self.grad_op = ops.GradOperation(sens_param=True)
self.grad_wrt_output = Tensor([[0.1, 0.6, 0.2], [0.8, 1.3, 1.1]], dtype=mstype.float32)
def construct(self, x, y):
gradient_function = self.grad_op(self.net)
return gradient_function(x, y, self.grad_wrt_output)
output = GradNetWrtX(Net())(x, y)
print(output)
[[2.211 0.51 1.49 ]
[5.588 2.68 4.07 ]]