mindspore.nn.optim_ex.SGD

class mindspore.nn.optim_ex.SGD(params, lr, momentum=0, dampening=0, weight_decay=0, nesterov=False, *, maximize=False)[source]

Stochastic Gradient Descent optimizer.

\[v_{t+1} = u st v_{t} + gradient st (1-dampening)\]

If nesterov is True:

\[p_{t+1} = p_{t} - lr st (gradient + u st v_{t+1})\]

If nesterov is False:

\[p_{t+1} = p_{t} - lr st v_{t+1}\]

To be noticed, for the first step, \(v_{t+1} = gradient\).

Here : where p, v and u denote the parameters, accum, and momentum respectively.

Warning

This is an experimental optimizer API that is subject to change. This module must be used with lr scheduler module in LRScheduler Class .

Parameters

params (Union[list(Parameter), list(dict)]) – list of parameters to optimize or dicts defining parameter groups.
lr (Union[int, float, Tensor]) – learning rate.
momentum (Union[int, float], optional) – momentum factor. Default: 0.
weight_decay (float, optional) – weight decay (L2 penalty). Default: 0.
dampening (Union[int, float], optional) – dampening for momentum. Default: 0.
nesterov (bool, optional) – enables Nesterov momentum. Default: False.

Keyword Arguments

maximize (bool, optional) – maximize the params based on the objective, instead of minimizing. Default: False.

Inputs:

gradients (tuple[Tensor]) - The gradients of params.

Raises

ValueError – If the learning rate is not int, float or Tensor.
ValueError – If the learning rate is less than 0.
ValueError – If the momentum or weight_decay value is less than 0.0.
ValueError – If the momentum, dampening or weight_decay value is not int or float.
ValueError – If the nesterov and maximize is not bool.
ValueError – If the nesterov is true, momentum is not positive or dampening is not 0.0.

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore
>>> from mindspore import nn
>>> # Define the network structure of LeNet5. Refer to
>>> # https://gitee.com/mindspore/docs/blob/r2.1/docs/mindspore/code/lenet.py
>>> net = LeNet5()
>>> loss_fn = nn.SoftmaxCrossEntropyWithLogits(sparse=True)
>>> optimizer = nn.optim_ex.SGD(net.trainable_params(), lr=0.1)
>>> def forward_fn(data, label):
...     logits = net(data)
...     loss = loss_fn(logits, label)
...     return loss, logits
>>> grad_fn = mindspore.value_and_grad(forward_fn, None, optimizer.parameters, has_aux=True)
>>> def train_step(data, label):
...     (loss, _), grads = grad_fn(data, label)
...     optimizer(grads)
...     return loss