
class mindspore.ops.SGD(dampening=0.0, weight_decay=0.0, nesterov=False)[source]

Computes the stochastic gradient descent. Momentum is optional.

Nesterov momentum is based on the formula from paper On the importance of initialization and momentum in deep learning.


If parameters are not grouped, the weight_decay in optimizer will be applied on the network parameters without 'beta' or 'gamma' in their names. Users can group parameters to change the strategy of decaying weight. When parameters are grouped, each group can set weight_decay. If not, the weight_decay in optimizer will be applied. For more details, please refer to mindspore.nn.SGD.

  • dampening (float) – The dampening for momentum. Default: 0.0 .

  • weight_decay (float) – Weight decay (L2 penalty). Default: 0.0 .

  • nesterov (bool) – Enable Nesterov momentum. Default: False .

  • parameters (Tensor) - Parameters to be updated. With float16 or float32 data type.

  • gradient (Tensor) - Gradient, with float16 or float32 data type.

  • learning_rate (Tensor) - Learning rate, a scalar tensor with float16 or float32 data type. e.g. Tensor(0.1, mindspore.float32)

  • accum (Tensor) - Accum(velocity) to be updated. With float16 or float32 data type.

  • momentum (Tensor) - Momentum, a scalar tensor with float16 or float32 data type. e.g. Tensor(0.1, mindspore.float32).

  • stat (Tensor) - States to be updated with the same shape as gradient, with float16 or float32 data type.


Tensor, parameters to be updated.

  • TypeError – If dampening or weight_decay is not a float.

  • TypeError – If nesterov is not a bool.

  • TypeError – If parameters, gradient, learning_rate, accum, momentum or stat is not a Tensor.

  • TypeError – If dtype of parameters, gradient, learning_rate, accum, momentum or stat is neither float16 nor float32.

Supported Platforms:

Ascend GPU CPU


>>> import mindspore
>>> import numpy as np
>>> from mindspore import Tensor, ops
>>> sgd = ops.SGD()
>>> parameters = Tensor(np.array([2, -0.5, 1.7, 4]), mindspore.float32)
>>> gradient = Tensor(np.array([1, -1, 0.5, 2]), mindspore.float32)
>>> learning_rate = Tensor(0.01, mindspore.float32)
>>> accum = Tensor(np.array([0.1, 0.3, -0.2, -0.1]), mindspore.float32)
>>> momentum = Tensor(0.1, mindspore.float32)
>>> stat = Tensor(np.array([1.5, -0.3, 0.2, -0.7]), mindspore.float32)
>>> output = sgd(parameters, gradient, learning_rate, accum, momentum, stat)
>>> print(output.asnumpy())
[1.99 -0.4903 1.695 3.9801]