Function Differences with tf.keras.optimizers.Adagrad
tf.keras.optimizers.Adagrad
tf.keras.optimizers.Adagrad(
learning_rate=0.001,
initial_accumulator_value=0.1,
epsilon=1e-07,
name='Adagrad',
**kwargs
) -> Tensor
For more information, see tf.keras.optimizers.Adagrad.
mindspore.nn.Adagrad
class mindspore.nn.Adagrad(
params,
accum=0.1,
learning_rate=0.001,
update_slots=True,
loss_scale=1.0,
weight_decay=0.0
)(grads) -> Tensor
For more information, see mindspore.nn.Adagrad.
Differences
TensorFlow: Adagrad is an optimizer with a specific parameter learning rate that is used to implement the Adagrad algorithm, and adjusts to the frequency of parameter updates during training. The more updates the parameters receive, the smaller the updates are.
MindSpore: The implementation function of API in MindSpore is basically the same as that of TensorFlow, with different parameter names and more update_slots, loss_scale and weight_decay parameters than TensorFlow.
Categories |
Subcategories |
TensorFlow |
MindSpore |
Differences |
---|---|---|---|---|
Parameters |
Parameter 1 |
learning_rate |
learning_rate |
- |
Parameter 2 |
initial_accumulator_value |
accum |
Same function, different parameter names |
|
Parameter 3 |
epsilon |
- |
TensorFlow is used to maintain numerical stability of small floating point values, and MindSpore does not have this parameter |
|
Parameter 4 |
name |
- |
Not involved |
|
Parameter 5 |
**kwargs |
- |
Not involved |
|
Parameter 6 |
- |
params |
A list of parameters or a list of dictionaries, not available in TensorFlow |
|
Parameter 7 |
- |
update_slots |
If the value is True, the accumulator is updated, and there is no such parameter in TensorFlow |
|
Parameter 8 |
- |
loss_scale |
Gradient scaling factor, default value: 1.0. TensorFlow does not have this parameter |
|
Parameter 9 |
- |
weight_decay |
Weight decay (L2 penalty), default value: 0.0, no such parameter in TensorFlow |
|
Input |
Single input |
- |
grads |
Gradient of |
Code Example
Both APIs have the same function.
# TensorFlow
import tensorflow as tf
opt = tf.keras.optimizers.Adagrad(initial_accumulator_value=0.1, learning_rate=0.1)
var = tf.Variable(1.0)
val0 = var.value()
loss = lambda: (var ** 2)/2.0
step_count = opt.minimize(loss, [var]).numpy()
val1 = var.value()
print([val1.numpy()])
# [0.9046537]
step_count = opt.minimize(loss, [var]).numpy()
val2 = var.value()
print([val2.numpy()])
# [0.8393387]
# MindSpore
import numpy as np
import mindspore.nn as nn
import mindspore as ms
from mindspore.dataset import NumpySlicesDataset
from mindspore.train import Model
class Net(nn.Cell):
def __init__(self):
super(Net, self).__init__()
self.w = ms.Parameter(ms.Tensor(np.array([1.0], np.float32)), name='w')
def construct(self, x):
f = self.w * x
return f
class MyLoss(nn.LossBase):
def __init__(self, reduction='none'):
super(MyLoss, self).__init__()
def construct(self, y, y_pred):
return (y - y_pred) ** 2 / 2.0
net = Net()
loss = MyLoss()
optim = nn.Adagrad(params=net.trainable_params(), accum=0.1, learning_rate=0.1)
model = Model(net, loss_fn=loss, optimizer=optim)
data_x = np.array([1.0], dtype=np.float32)
data_y = np.array([0.0], dtype=np.float32)
data = NumpySlicesDataset((data_x, data_y), ["x", "y"])
input_x = ms.Tensor(np.array([1.0], np.float32))
y0 = net(input_x)
model.train(1, data)
y1 = net(input_x)
print(y1)
# [0.9046537]
model.train(1, data)
y2 = net(input_x)
print(y2)
# [0.8393387]