mindspore.ops.AdamWeightDecay
- class mindspore.ops.AdamWeightDecay(use_locking=False)[source]
Updates gradients by the Adaptive Moment Estimation algorithm with weight decay (AdamWeightDecay).
The Adam algorithm is proposed in Adam: A Method for Stochastic Optimization. The AdamWeightDecay variant was proposed in Decoupled Weight Decay Regularization.
The updating formulas are as follows,
represents the 1st moment vector, represents the 2nd moment vector, represents gradient, represent beta1 and beta2, represents learning_rate, represents var, represents weight_decay, represents epsilon.- Parameters
use_locking (bool) – Whether to enable a lock to protect variable tensors from being updated. If
True
, updates of the var, m, and v tensors will be protected by a lock. IfFalse
, the result is unpredictable. Default:False
.
- Inputs:
var (Union[Parameter, Tensor]) - Weights to be updated. The shape is
where means any number of additional dimensions. The data type can be float16 or float32.m (Union[Parameter, Tensor]) - The 1st moment vector in the updating formula, it should have the the shape as var. The data type can be float16 or float32.
v (Union[Parameter, Tensor]) - The 2nd moment vector in the updating formula, it should have the same shape as m.
lr (float) -
in the updating formula. The paper suggested value is , the data type should be float32.beta1 (float) - The exponential decay rate for the 1st moment estimations, the data type should be float32. The paper suggested value is
beta2 (float) - The exponential decay rate for the 2nd moment estimations, the data type should be float32. The paper suggested value is
epsilon (float) - Term added to the denominator to improve numerical stability, the data type should be float32.
decay (float) - The weight decay value, must be a scalar tensor with float32 data type. Default:
0.0
.gradient (Tensor) - Gradient, has the same shape as var.
- Outputs:
Tuple of 3 Tensor, the updated parameters.
var (Tensor) - The same shape and data type as var.
m (Tensor) - The same shape and data type as m.
v (Tensor) - The same shape and data type as v.
- Raises
TypeError – If use_locking is not a bool.
TypeError – If lr, beta1, beta2, epsilon or decay is not a float32.
TypeError – If var, m or v is neither float16 nor float32.
TypeError – If gradient is not a Tensor.
ValueError – If epsilon <= 0.
ValueError – If beta1, beta2 is not in range (0.0,1.0).
ValueError – If decay < 0.
- Supported Platforms:
Ascend
GPU
CPU
Examples
>>> import numpy as np >>> import mindspore.nn as nn >>> from mindspore import Tensor, Parameter, ops >>> class Net(nn.Cell): ... def __init__(self): ... super(Net, self).__init__() ... self.adam_weight_decay = ops.AdamWeightDecay() ... self.var = Parameter(Tensor(np.ones([2, 2]).astype(np.float32)), name="var") ... self.m = Parameter(Tensor(np.ones([2, 2]).astype(np.float32)), name="m") ... self.v = Parameter(Tensor(np.ones([2, 2]).astype(np.float32)), name="v") ... def construct(self, lr, beta1, beta2, epsilon, decay, grad): ... out = self.adam_weight_decay(self.var, self.m, self.v, lr, beta1, beta2, ... epsilon, decay, grad) ... return out >>> net = Net() >>> gradient = Tensor(np.ones([2, 2]).astype(np.float32)) >>> output = net(0.001, 0.9, 0.999, 1e-8, 0.0, gradient) >>> print(net.var.asnumpy()) [[0.999 0.999] [0.999 0.999]]