mindspore.ops.Adam
- class mindspore.ops.Adam(use_locking=False, use_nesterov=False)[source]
Updates gradients by the Adaptive Moment Estimation (Adam) algorithm.
The Adam algorithm is proposed in Adam: A Method for Stochastic Optimization.
For more details, please refer to
mindspore.nn.Adam
.The updating formulas are as follows,
represents the 1st moment vector, represents the 2nd moment vector, represents gradient, represents scaling factor lr, represent beta1 and beta2, represents updating step while and represent beta1_power and beta2_power, represents learning_rate, represents var, represents epsilon.- Parameters
use_locking (bool) – Whether to enable a lock to protect variable tensors from being updated. If true, updates of the var, m, and v tensors will be protected by a lock. If false, the result is unpredictable. Default: False.
use_nesterov (bool) – Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients. If true, update the gradients using NAG. If false, update the gradients without using NAG. Default: False.
- Inputs:
var (Tensor) - Weights to be updated. The shape is
where means, any number of additional dimensions. The data type can be float16 or float32.m (Tensor) - The 1st moment vector in the updating formula, the shape and data type value should be the same as var.
v (Tensor) - the 2nd moment vector in the updating formula, the shape and data type value should be the same as var. Mean square gradients with the same type as var.
beta1_power (float) -
in the updating formula, the data type value should be the same as var.beta2_power (float) -
in the updating formula, the data type value should be the same as var.lr (float) -
in the updating formula. The paper suggested value is , the data type value should be the same as var.beta1 (float) - The exponential decay rate for the 1st moment estimations, the data type value should be the same as var. The paper suggested value is
beta2 (float) - The exponential decay rate for the 2nd moment estimations, the data type value should be the same as var. The paper suggested value is
epsilon (float) - Term added to the denominator to improve numerical stability.
gradient (Tensor) - Gradient, has the same shape and data type as var.
- Outputs:
Tuple of 3 Tensor, the updated parameters.
var (Tensor) - The same shape and data type as Inputs var.
m (Tensor) - The same shape and data type as Inputs m.
v (Tensor) - The same shape and data type as Inputs v.
- Raises
- Supported Platforms:
Ascend
GPU
CPU
Examples
>>> class Net(nn.Cell): ... def __init__(self): ... super(Net, self).__init__() ... self.apply_adam = ops.Adam() ... self.var = Parameter(Tensor(np.ones([2, 2]).astype(np.float32)), name="var") ... self.m = Parameter(Tensor(np.ones([2, 2]).astype(np.float32)), name="m") ... self.v = Parameter(Tensor(np.ones([2, 2]).astype(np.float32)), name="v") ... def construct(self, beta1_power, beta2_power, lr, beta1, beta2, epsilon, grad): ... out = self.apply_adam(self.var, self.m, self.v, beta1_power, beta2_power, lr, beta1, beta2, ... epsilon, grad) ... return out ... >>> net = Net() >>> gradient = Tensor(np.ones([2, 2]).astype(np.float32)) >>> output = net(0.9, 0.999, 0.001, 0.9, 0.999, 1e-8, gradient) >>> print(net.var.asnumpy()) [[0.9996838 0.9996838] [0.9996838 0.9996838]]