mindspore.experimental.optim.SGD
- class mindspore.experimental.optim.SGD(params, lr, momentum=0, dampening=0, weight_decay=0.0, nesterov=False, *, maximize=False)[source]
Stochastic Gradient Descent optimizer.
\[v_{t+1} = u \ast v_{t} + gradient \ast (1-dampening)\]If nesterov is True:
\[p_{t+1} = p_{t} - lr \ast (gradient + u \ast v_{t+1})\]If nesterov is False:
\[p_{t+1} = p_{t} - lr \ast v_{t+1}\]To be noticed, for the first step, \(v_{t+1} = gradient\).
Here : where p, v and u denote the parameters, accum, and momentum respectively.
Warning
This is an experimental optimizer API that is subject to change. This module must be used with lr scheduler module in LRScheduler Class .
- Parameters
params (Union[list(Parameter), list(dict)]) – list of parameters to optimize or dicts defining parameter groups.
momentum (Union[int, float], optional) – momentum factor. Default:
0
.weight_decay (float, optional) – weight decay (L2 penalty). Default:
0.
.dampening (Union[int, float], optional) – dampening for momentum. Default:
0
.nesterov (bool, optional) – enables Nesterov momentum. Default:
False
.
- Keyword Arguments
maximize (bool, optional) – maximize the params based on the objective, instead of minimizing. Default:
False
.
- Inputs:
gradients (tuple[Tensor]) - The gradients of params.
- Raises
ValueError – If the learning rate is not int, float or Tensor.
ValueError – If the learning rate is less than 0.
ValueError – If the momentum or weight_decay value is less than 0.0.
ValueError – If the momentum, dampening or weight_decay value is not int or float.
ValueError – If the nesterov and maximize is not bool.
ValueError – If the nesterov is true, momentum is not positive or dampening is not 0.0.
- Supported Platforms:
Ascend
GPU
CPU
Examples
>>> import mindspore >>> from mindspore import nn >>> from mindspore.experimental import optim >>> # Define the network structure of LeNet5. Refer to >>> # https://gitee.com/mindspore/docs/blob/r2.3.0rc2/docs/mindspore/code/lenet.py >>> net = LeNet5() >>> loss_fn = nn.SoftmaxCrossEntropyWithLogits(sparse=True) >>> optimizer = optim.SGD(net.trainable_params(), lr=0.1) >>> def forward_fn(data, label): ... logits = net(data) ... loss = loss_fn(logits, label) ... return loss, logits >>> grad_fn = mindspore.value_and_grad(forward_fn, None, optimizer.parameters, has_aux=True) >>> def train_step(data, label): ... (loss, _), grads = grad_fn(data, label) ... optimizer(grads) ... return loss