mindspore.nn.LARS
- class mindspore.nn.LARS(*args, **kwargs)[source]
Implements the LARS algorithm with LARSUpdate Operator.
LARS is an optimization algorithm employing a large batch optimization technique. Refer to paper LARGE BATCH TRAINING OF CONVOLUTIONAL NETWORKS.
The updating formulas are as follows,
\[\begin{split}\begin{array}{ll} \\ \lambda = \frac{\theta \text{ * } || \omega || } \\ {|| g_{t} || \text{ + } \delta \text{ * } || \omega || } \\ \lambda = \begin{cases} \min(\frac{\lambda}{\alpha }, 1) & \text{ if } clip = True \\ \lambda & \text{ otherwise } \end{cases}\\ g_{t+1} = \lambda * (g_{t} + \delta * \omega) \end{array}\end{split}\]\(\theta\) represents coefficient, \(\omega\) represents parameters, \(g\) represents gradients, \(t\) represents updating step, \(\delta\) represents weight_decay, \(\alpha\) represents learning_rate, \(clip\) represents use_clip.
- Parameters
optimizer (Optimizer) – MindSpore optimizer for which to wrap and modify gradients.
epsilon (float) – Term added to the denominator to improve numerical stability. Default: 1e-05.
coefficient (float) – Trust coefficient for calculating the local learning rate. Default: 0.001.
use_clip (bool) – Whether to use clip operation for calculating the local learning rate. Default: False.
lars_filter (Function) – A function to determine whether apply the LARS algorithm. Default: lambda x: ‘LayerNorm’ not in x.name and ‘bias’ not in x.name.
- Inputs:
gradients (tuple[Tensor]) - The gradients of params in the optimizer, the shape is the as same as the params in the optimizer.
- Outputs:
Union[Tensor[bool], tuple[Parameter]], it depends on the output of optimizer.
- Supported Platforms:
Ascend
CPU
Examples
>>> net = Net() >>> loss = nn.SoftmaxCrossEntropyWithLogits() >>> opt = nn.Momentum(net.trainable_params(), 0.1, 0.9) >>> opt_lars = nn.LARS(opt, epsilon=1e-08, coefficient=0.02) >>> model = Model(net, loss_fn=loss, optimizer=opt_lars, metrics=None)