mindspore.ops.Adam
- class mindspore.ops.Adam(use_locking=False, use_nesterov=False)[source]
Updates gradients by the Adaptive Moment Estimation (Adam) algorithm.
The Adam algorithm is proposed in Adam: A Method for Stochastic Optimization.
For more details, please refer to
mindspore.nn.Adam
.The updating formulas are as follows,
\[\begin{split}\begin{array}{ll} \\ m = \beta_1 * m + (1 - \beta_1) * g \\ v = \beta_2 * v + (1 - \beta_2) * g * g \\ l = \alpha * \frac{\sqrt{1-\beta_2^t}}{1-\beta_1^t} \\ w = w - l * \frac{m}{\sqrt{v} + \epsilon} \end{array}\end{split}\]\(m\) represents the 1st moment vector, \(v\) represents the 2nd moment vector, \(g\) represents gradient, \(l\) represents scaling factor lr, \(\beta_1, \beta_2\) represent beta1 and beta2, \(t\) represents updating step while \(beta_1^t(\beta_1^{t})\) and \(beta_2^t(\beta_2^{t})\) represent beta1_power and beta2_power, \(\alpha\) represents learning_rate, \(w\) represents var, \(\epsilon\) represents epsilon.
Inputs of var, m, v and gradient comply with the implicit type conversion rules to make the data types consistent. If they have different data types, the lower priority data type will be converted to the relatively highest priority data type.
- Parameters
use_locking (bool) – Whether to enable a lock to protect variable tensors from being updated. If
True
, updates of the var, m, and v tensors will be protected by a lock. IfFalse
, the result is unpredictable. Default:False
.use_nesterov (bool) – Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients. If
True
, update the gradients using NAG. IfFalse
, update the gradients without using NAG. Default:False
.
- Inputs:
var (Parameter) - Weights to be updated. The shape is \((N, *)\) where \(*\) means, any number of additional dimensions. The data type can be float16 or float32.
m (Parameter) - The 1st moment vector in the updating formula, the shape should be the same as var.
v (Parameter) - the 2nd moment vector in the updating formula, the shape should be the same as var.
beta1_power (float) - \(beta_1^t(\beta_1^{t})\) in the updating formula.
beta2_power (float) - \(beta_2^t(\beta_2^{t})\) in the updating formula.
lr (float) - \(l\) in the updating formula. The paper suggested value is \(10^{-8}\).
beta1 (float) - The exponential decay rate for the 1st moment estimations. The paper suggested value is \(0.9\).
beta2 (float) - The exponential decay rate for the 2nd moment estimations. The paper suggested value is \(0.999\).
epsilon (float) - Term added to the denominator to improve numerical stability.
gradient (Tensor) - Gradient, has the same shape and data type as var.
- Outputs:
Tuple of 3 Tensor, the updated parameters.
var (Tensor) - The same shape and data type as Inputs var.
m (Tensor) - The same shape and data type as Inputs m.
v (Tensor) - The same shape and data type as Inputs v.
- Raises
- Supported Platforms:
Ascend
GPU
CPU
Examples
>>> import mindspore >>> import numpy as np >>> from mindspore import Tensor, nn, ops >>> from mindspore import Parameter >>> class Net(nn.Cell): ... def __init__(self): ... super(Net, self).__init__() ... self.apply_adam = ops.Adam() ... self.var = Parameter(Tensor(np.ones([2, 2]).astype(np.float32)), name="var") ... self.m = Parameter(Tensor(np.ones([2, 2]).astype(np.float32)), name="m") ... self.v = Parameter(Tensor(np.ones([2, 2]).astype(np.float32)), name="v") ... def construct(self, beta1_power, beta2_power, lr, beta1, beta2, epsilon, grad): ... out = self.apply_adam(self.var, self.m, self.v, beta1_power, beta2_power, lr, beta1, beta2, ... epsilon, grad) ... return out ... >>> net = Net() >>> gradient = Tensor(np.ones([2, 2]).astype(np.float32)) >>> output = net(0.9, 0.999, 0.001, 0.9, 0.999, 1e-8, gradient) >>> print(net.var.asnumpy()) [[0.9996838 0.9996838] [0.9996838 0.9996838]]