Function Differences with torch.autograd.backward and torch.autograd.grad

torch.autograd.backward

torch.autograd.backward(
  tensors,
  grad_tensors=None,
  retain_graph=None,
  create_graph=False,
  grad_variables=None
)

For more information, see torch.autograd.backward.

torch.autograd.grad

torch.autograd.grad(
  outputs,
  inputs,
  grad_outputs=None,
  retain_graph=None,
  create_graph=False,
  only_inputs=True,
  allow_unused=False
)

For more information, see torch.autograd.grad.

mindspore.ops.GradOperation

class mindspore.ops.GradOperation(
  get_all=False,
  get_by_list=False,
  sens_param=False
)

For more information, see mindspore.ops.GradOperation.

Differences

PyTorch: Use torch.autograd.backward to compute the sum of gradients of given Tensors with respect to graph leaves. When calculating the gradient of the Tensor with backpropagation, only the gradient of graph leaves with requires_grad=True will be calculated. Use torch.autograd.grad to compute and return the sum of gradients of outputs with respect to the inputs. If only_inputs is True, the function will only return a list of gradients with respect to the specified inputs.

MindSpore: Compute the first derivative. When get_all is set to False, the first input derivative is computed. When get_all is set to True, all input derivatives are computed. When get_by_list is set to False, weight derivatives are not computed. When get_by_list is set to True, the weight derivative is computed. sens_param scales the output value of the network to change the final gradient.

Code Example# In MindSpore：
import numpy as np
import mindspore as ms
import mindspore.nn as nn
from mindspore import ops

class Net(nn.Cell):
    def __init__(self):
        super(Net, self).__init__()
        self.matmul = ops.MatMul()
        self.z = ms.Parameter(ms.Tensor(np.array([1.0], np.float32)), name='z')
    def construct(self, x, y):
        x = x * self.z
        out = self.matmul(x, y)
        return out

class GradNetWrtX(nn.Cell):
    def __init__(self, net):
        super(GradNetWrtX, self).__init__()
        self.net = net
        self.grad_op = ops.GradOperation()
    def construct(self, x, y):
        gradient_function = self.grad_op(self.net)
        return gradient_function(x, y)

x = ms.Tensor([[0.5, 0.6, 0.4], [1.2, 1.3, 1.1]], dtype=ms.float32)
y = ms.Tensor([[0.01, 0.3, 1.1], [0.1, 0.2, 1.3], [2.1, 1.2, 3.3]], dtype=ms.float32)
output = GradNetWrtX(Net())(x, y)
print(output)
# Out:
# [[1.4100001 1.5999999 6.6      ]
#  [1.4100001 1.5999999 6.6      ]]

# In torch:
import torch
x = torch.tensor(2., requires_grad=True)
y = torch.tensor(3., requires_grad=True)
z = x * x * y
z.backward()
print(x.grad, y.grad)
# Out:
# tensor(12.) tensor(4.)

x = torch.tensor(2.).requires_grad_()
y = torch.tensor(3.).requires_grad_()
z = x * x * y
grad_x = torch.autograd.grad(outputs=z, inputs=x)
print(grad_x[0])
# Out:
# tensor(12.)