Automatic Differentiation

Automatic differentiation can calculate a derivative value of a derivative function at a certain point, which is a generalization of backpropagation algorithms. The main problem solved by automatic differentiation is to decompose a complex mathematical operation into a series of simple basic operations. This function shields a large number of derivative details and processes from users, greatly reducing the threshold for using the framework.

MindSpore uses ops.GradOperation to calculate the first-order derivative. The ops.GradOperation attributes are as follows:

get_all: calculate the gradient. If it is equal to False, get the gradient of the first input. If it is equal to True, get the gradient of all inputs. The default value is False.
get_by_list: determines whether to derive the weight parameters. The default value is False.
sens_param: determines whether to scale the output value of the network to change the final gradient. The default value is False.

This chapter uses ops.GradOperation in MindSpore to find first-order derivatives of the function $f (x) = w x + b$ .

First-order Derivative of the Input

Define the formula before deriving the input:

\begin{matrix} (1) & f (x) = w x + b \end{matrix}

The example code below is an expression of Equation (1). Since MindSpore is functionally programmed, all expressions of computational formulas are represented as functions.

import numpy as np
import mindspore.nn as nn
import mindspore as ms

class Net(nn.Cell):
    def __init__(self):
        super(Net, self).__init__()
        self.w = ms.Parameter(np.array([6.0]), name='w')
        self.b = ms.Parameter(np.array([1.0]), name='b')

    def construct(self, x):
        f = self.w * x + self.b
        return f

Define the derivative class GradNet. In the __init__ function, define the self.net and ops.GradOperation networks. In the construct function, compute the derivative of self.net. The following formula (2) is generated in MindSpore:

\begin{matrix} (2) & f^{^{'}} (x) = w \end{matrix}

import mindspore as ms
import mindspore.ops as ops

class GradNet(nn.Cell):
    def __init__(self, net):
        super(GradNet, self).__init__()
        self.net = net
        self.grad_op = ops.GradOperation()

    def construct(self, x):
        gradient_function = self.grad_op(self.net)
        return gradient_function(x)

Finally, the weight parameter is defined as w, and a first-order derivative is found for the input parameter x in the input formula (1). According to the running result, the input in formula (1) is 6, that is:

\begin{matrix} (3) & f (x) = w x + b = 6 * x + 1 \end{matrix}

Derive the above equation:

\begin{matrix} (4) & f^{^{'}} (x) = w = 6 \end{matrix}

x = Tensor([100], dtype=ms.float32)
output = GradNet(Net())(x)

print(output)

[6.]

MindSpore calculates the first-order derivative using ops.GradOperation (get_all=False, get_by_list=False, sens_param=False). If get_all is set to False, the derivative of only the first input is calculated. If get_all is set to True, the derivative of all inputs is calculated.

First-order Derivative of the Weight

To compute weight derivatives, you need to set get_by_list in ops.GradOperation to True.

import mindspore as ms

class GradNet(nn.Cell):
    def __init__(self, net):
        super(GradNet, self).__init__()
        self.net = net
        self.params = ms.ParameterTuple(net.trainable_params())
        self.grad_op = ops.GradOperation(get_by_list=True)  # Set the first-order derivative of the weight parameters.

    def construct(self, x):
        gradient_function = self.grad_op(self.net, self.params)
        return gradient_function(x)

Next, derive the function:

# Perform a derivative calculation on the function.
x = ms.Tensor([100], dtype=ms.float32)
fx = GradNet(Net())(x)

# Print the result.
print(f"wgrad: {fx[0]}\nbgrad: {fx[1]}")

wgrad: [100.]
bgrad: [1.]

If derivation is not required for some weights, set requires_grad to False when defining the derivation network and declaring the corresponding weight parameters.

import mindspore as ms
from mindspore import ops

class Net(nn.Cell):
    def __init__(self):
        super(Net, self).__init__()
        self.w = ms.Parameter(ms.Tensor(np.array([6], np.float32)), name='w')
        self.b = ms.Parameter(ms.Tensor(np.array([1.0], np.float32)), name='b', requires_grad=False)

    def construct(self, x):
        out = x * self.w + self.b
        return out

class GradNet(nn.Cell):
    def __init__(self, net):
        super(GradNet, self).__init__()
        self.net = net
        self.params = ms.ParameterTuple(net.trainable_params())
        self.grad_op = ops.GradOperation(get_by_list=True)

    def construct(self, x):
        gradient_function = self.grad_op(self.net, self.params)
        return gradient_function(x)

# Construct a derivative network.
x = ms.Tensor([5], dtype=ms.float32)
fw = GradNet(Net())(x)

print(fw)

(Tensor(shape=[1], dtype=Float32, value= [ 5.00000000e+00]),)

Gradient Value Scaling

You can use the sens_param parameter to scale the output value of the network to change the final gradient. Set sens_param in ops.GradOperation to True and determine the scaling index. The dimension must be the same as the output dimension.

class GradNet(nn.Cell):
    def __init__(self, net):
        super(GradNet, self).__init__()
        self.net = net
        # Derivative operation.
        self.grad_op = ops.GradOperation(sens_param=True)
        # Scale an index.
        self.grad_wrt_output = Tensor([0.1], dtype=ms.float32)

    def construct(self, x):
        gradient_function = self.grad_op(self.net)
        return gradient_function(x, self.grad_wrt_output)

x = ms.Tensor([6], dtype=ms.float32)
output = GradNet(Net())(x)

print(output)

[0.6]

Stopping Gradient Calculation

You can use ops.stop_gradient to stop calculating gradients. The following is an example:

from mindspore.ops import stop_gradient

class Net(nn.Cell):
    def __init__(self):
        super(Net, self).__init__()
        self.w = ms.Parameter(ms.Tensor(np.array([6], np.float32)), name='w')
        self.b = ms.Parameter(ms.Tensor(np.array([1.0], np.float32)), name='b')

    def construct(self, x):
        out = x * self.w + self.b
        # Stop updating the gradient. The out does not contribute to gradient calculations.
        out = stop_gradient(out)
        return out

class GradNet(nn.Cell):
    def __init__(self, net):
        super(GradNet, self).__init__()
        self.net = net
        self.params = ms.ParameterTuple(net.trainable_params())
        self.grad_op = ops.GradOperation(get_by_list=True)

    def construct(self, x):
        gradient_function = self.grad_op(self.net, self.params)
        return gradient_function(x)

x = ms.Tensor([100], dtype=ms.float32)
output = GradNet(Net())(x)

print(f"wgrad: {output[0]}\nbgrad: {output[1]}")

wgrad: [0.]
bgrad: [0.]