Function Differences with torch.autograd.backward and torch.autograd.grad
torch.autograd.backward
torch.autograd.backward(
tensors,
grad_tensors=None,
retain_graph=None,
create_graph=False,
grad_variables=None
)
For more information, see torch.autograd.backward.
torch.autograd.grad
torch.autograd.grad(
outputs,
inputs,
grad_outputs=None,
retain_graph=None,
create_graph=False,
only_inputs=True,
allow_unused=False
)
For more information, see torch.autograd.grad.
mindspore.grad
mindspore.grad(
fn,
grad_position=0,
weights=None,
has_aux=False
)
For more information, see mindspore.grad.
Differences
PyTorch: Use torch.autograd.backward
to compute the sum of gradients of given Tensors with respect to graph leaves. When calculating the gradient of the Tensor with backpropagation, only the gradient of graph leaves with requires_grad=True
will be calculated. Use torch.autograd.grad
to compute and return the sum of gradients of outputs with respect to the inputs. If only_inputs
is True, the function will only return a list of gradients with respect to the specified inputs.
MindSpore: Compute the first derivative. When grad_position
is set to int or tuple of int, the corresponding input derivatives are computed. if weights
is set, the network parameters derivatives will be computed. If has_aux
is True, only the first output of fn
participates in the computation, in this case, the fn
should has at least two outputs.
Code Example
# In MindSpore:
import numpy as np
import mindspore as ms
import mindspore.nn as nn
from mindspore import ops
class Net(nn.Cell):
def __init__(self):
super(Net, self).__init__()
self.matmul = ops.MatMul()
self.z = ms.Parameter(ms.Tensor(np.array([1.0], np.float32)), name='z')
def construct(self, x, y):
x = x * self.z
out = self.matmul(x, y)
return out
class GradNetWrtX(nn.Cell):
def __init__(self, net):
super(GradNetWrtX, self).__init__()
self.net = net
def construct(self, x, y):
gradient_function = ms.grad(self.net)
return gradient_function(x, y)
x = ms.Tensor([[0.5, 0.6, 0.4], [1.2, 1.3, 1.1]], dtype=ms.float32)
y = ms.Tensor([[0.01, 0.3, 1.1], [0.1, 0.2, 1.3], [2.1, 1.2, 3.3]], dtype=ms.float32)
output = GradNetWrtX(Net())(x, y)
print(output)
# Out:
# [[1.4100001 1.5999999 6.6 ]
# [1.4100001 1.5999999 6.6 ]]
# In torch:
import torch
x = torch.tensor(2., requires_grad=True)
y = torch.tensor(3., requires_grad=True)
z = x * x * y
z.backward()
print(x.grad, y.grad)
# Out:
# tensor(12.) tensor(4.)
x = torch.tensor(2.).requires_grad_()
y = torch.tensor(3.).requires_grad_()
z = x * x * y
grad_x = torch.autograd.grad(outputs=z, inputs=x)
print(grad_x[0])
# Out:
# tensor(12.)