文档反馈

问题文档片段

问题文档片段包含公式时，显示为空格。

提交类型

issue

有点复杂...

找人问问吧。

请选择提交类型

问题类型

规范和低错类

- 规范和低错类：

- 错别字或拼写错误，标点符号使用错误、公式错误或显示异常。

- 链接错误、空单元格、格式错误。

- 英文中包含中文字符。

- 界面和描述不一致，但不影响操作。

- 表述不通顺，但不影响理解。

- 版本号不匹配：如软件包名称、界面版本号。

易用性

- 易用性：

- 关键步骤错误或缺失，无法指导用户完成任务。

- 缺少主要功能描述、关键词解释、必要前提条件、注意事项等。

- 描述内容存在歧义指代不明、上下文矛盾。

- 逻辑不清晰，该分类、分项、分步骤的没有给出。

正确性

- 正确性：

- 技术原理、功能、支持平台、参数类型、异常报错等描述和软件实现不一致。

- 原理图、架构图等存在错误。

- 命令、命令参数等错误。

- 代码片段错误。

- 命令无法完成对应功能。

- 界面错误，无法指导操作。

- 代码样例运行报错、运行结果不符。

风险提示

- 风险提示：

- 对重要数据或系统存在风险的操作，缺少安全提示。

内容合规

- 内容合规：

- 违反法律法规，涉及政治、领土主权等敏感词。

- 内容侵权。

请选择问题类型

问题描述

点击输入详细问题描述，以帮助我们快速定位问题。

文档反馈

mindspore.ops.FusedSparseAdam

class mindspore.ops.FusedSparseAdam(use_locking=False, use_nesterov=False)[source]

Merges the duplicate value of the gradient and then updates parameters by the Adaptive Moment Estimation (Adam) algorithm. This operator is used when the gradient is sparse.

The Adam algorithm is proposed in Adam: A Method for Stochastic Optimization.

The updating formulas are as follows,

\begin{array}{r} \begin{array}{ll} m = β_{1} * m + (1 - β_{1}) * g \\ v = β_{2} * v + (1 - β_{2}) * g * g \\ l = α * \frac{\sqrt{1 - β_{2}^{t}}}{1 - β_{1}^{t}} \\ w = w - l * \frac{m}{\sqrt{v} + ϵ} \end{array} \end{array}

$m$ represents the 1st moment vector, $v$ represents the 2nd moment vector, $g$ represents gradient, $l$ represents scaling factor lr, $β_{1}, β_{2}$ represent beta1 and beta2, $t$ represents updating step while $b e t a_{1}^{t}$ and $b e t a_{2}^{t}$ represent beta1_power and beta2_power, $α$ represents learning_rate, $w$ represents var, $ϵ$ represents epsilon.

All of inputs except indices comply with the implicit type conversion rules to make the data types consistent. If they have different data types, lower priority data type will be converted to relatively highest priority data type. RuntimeError exception will be thrown when the data type conversion of Parameter is required.

Parameters

use_locking (bool) – Whether to enable a lock to protect variable tensors from being updated. If true, updates of the var, m, and v tensors will be protected by a lock. If false, the result is unpredictable. Default: False.
use_nesterov (bool) – Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients. If true, update the gradients using NAG. If false, update the gradients without using NAG. Default: False.

Inputs:

var (Parameter) - Parameters to be updated with float32 data type. The shape is $(N, *)$ where $*$ means, any number of additional dimensions.
m (Parameter) - The 1st moment vector in the updating formula, has the same shape and data type as var.
v (Parameter) - The 2nd moment vector in the updating formula, has the same shape and data type as var. Mean square gradients, has the same type as var with float32 data type.
beta1_power (Tensor) - $b e t a_{1}^{t}$ in the updating formula with float32 data type. The shape is $(1,)$ .
beta2_power (Tensor) - $b e t a_{2}^{t}$ in the updating formula with float32 data type. The shape is $(1,)$ .
lr (Tensor) - $l$ in the updating formula. With float32 data type. The shape is $(1,)$ .
beta1 (Tensor) - The exponential decay rate for the 1st moment estimations with float32 data type. The shape is $(1,)$ .
beta2 (Tensor) - The exponential decay rate for the 2nd moment estimations with float32 data type. The shape is $(1,)$ .
epsilon (Tensor) - Term added to the denominator to improve numerical stability with float32 data type. The shape is $(1,)$ .
gradient (Tensor) - Gradient, has the same data type as var and gradient.shape[1:] = var.shape[1:] if var.shape > 1.
indices (Tensor) - Gradient indices with int32 data type and indices.shape[0] = gradient.shape[0].

Outputs:

Tuple of 3 Tensors, this operator will update the input parameters directly, the outputs are useless.

var (Tensor) - A Tensor with shape $(1,)$ .
m (Tensor) - A Tensor with shape $(1,)$ .
v (Tensor) - A Tensor with shape $(1,)$ .

Raises

TypeError – If neither use_locking nor use_neserov is a bool.
TypeError – If dtype of var, m, v, beta1_power, beta2_power, lr, beta1, beta2, epsilon, gradient or indices is not float32.

Supported Platforms:: Ascend CPU

Examples

>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.sparse_apply_adam = ops.FusedSparseAdam()
...         self.var = Parameter(Tensor(np.ones([3, 1, 2]).astype(np.float32)), name="var")
...         self.m = Parameter(Tensor(np.ones([3, 1, 2]).astype(np.float32)), name="m")
...         self.v = Parameter(Tensor(np.ones([3, 1, 2]).astype(np.float32)), name="v")
...     def construct(self, beta1_power, beta2_power, lr, beta1, beta2, epsilon, grad, indices):
...         out = self.sparse_apply_adam(self.var, self.m, self.v, beta1_power, beta2_power, lr, beta1, beta2,
...                                      epsilon, grad, indices)
...         return out
...
>>> net = Net()
>>> beta1_power = Tensor(0.9, mindspore.float32)
>>> beta2_power = Tensor(0.999, mindspore.float32)
>>> lr = Tensor(0.001, mindspore.float32)
>>> beta1 = Tensor(0.9, mindspore.float32)
>>> beta2 = Tensor(0.999, mindspore.float32)
>>> epsilon = Tensor(1e-8, mindspore.float32)
>>> gradient = Tensor(np.array([[[0.1, 0.1]], [[0.1, 0.1]]]), mindspore.float32)
>>> indices = Tensor([0, 1], mindspore.int32)
>>> output = net(beta1_power, beta2_power, lr, beta1, beta2, epsilon, gradient, indices)
>>> print(net.var.asnumpy())
[[[0.9997121  0.9997121 ]]
 [[0.9997121  0.9997121 ]]
 [[0.99971527 0.99971527]]]