mindspore.ops.SGD

class mindspore.ops.SGD(*args, **kwargs)[source]

Computes the stochastic gradient descent. Momentum is optional.

Nesterov momentum is based on the formula from paper On the importance of initialization and momentum in deep learning.

Note

For details, please refer to nn.SGD source code.

Parameters
  • dampening (float) – The dampening for momentum. Default: 0.0.

  • weight_decay (float) – Weight decay (L2 penalty). Default: 0.0.

  • nesterov (bool) – Enable Nesterov momentum. Default: False.

Inputs:
  • parameters (Tensor) - Parameters to be updated. With float16 or float32 data type.

  • gradient (Tensor) - Gradient, with float16 or float32 data type.

  • learning_rate (Tensor) - Learning rate, a scalar tensor with float16 or float32 data type. e.g. Tensor(0.1, mindspore.float32)

  • accum (Tensor) - Accum(velocity) to be updated. With float16 or float32 data type.

  • momentum (Tensor) - Momentum, a scalar tensor with float16 or float32 data type. e.g. Tensor(0.1, mindspore.float32).

  • stat (Tensor) - States to be updated with the same shape as gradient, with float16 or float32 data type.

Outputs:

Tensor, parameters to be updated.

Raises
  • TypeError – If dampening or weight_decay is not a float.

  • TypeError – If nesterov is not a bool.

  • TypeError – If parameters, gradient, learning_rate, accum, momentum or stat is not a Tensor.

  • TypeError – If dtype of parameters, gradient, learning_rate, accum, momentum or stat is neither float16 nor float32.

Supported Platforms:

Ascend GPU

Examples

>>> sgd = ops.SGD()
>>> parameters = Tensor(np.array([2, -0.5, 1.7, 4]), mindspore.float32)
>>> gradient = Tensor(np.array([1, -1, 0.5, 2]), mindspore.float32)
>>> learning_rate = Tensor(0.01, mindspore.float32)
>>> accum = Tensor(np.array([0.1, 0.3, -0.2, -0.1]), mindspore.float32)
>>> momentum = Tensor(0.1, mindspore.float32)
>>> stat = Tensor(np.array([1.5, -0.3, 0.2, -0.7]), mindspore.float32)
>>> output = sgd(parameters, gradient, learning_rate, accum, momentum, stat)
>>> print(output)
(Tensor(shape=[4], dtype=Float32,
 value= [ 1.98989999e+00, -4.90300000e-01,  1.69520009e+00,  3.98009992e+00]),)