Function Differences with torch.optim.Adagrad
torch.optim.Adagrad
class torch.optim.Adagrad(
params,
lr=0.01,
lr_decay=0,
weight_decay=0,
initial_accumulator_value=0,
eps=1e-10
)
For more information, see torch.optim.Adagrad.
mindspore.nn.Adagrad
class mindspore.nn.Adagrad(
params,
accum=0.1,
learning_rate=0.001,
update_slots=True,
loss_scale=1.0,
weight_decay=0.0
)(grads)
For more information, see mindspore.nn.Adagrad.
Differences
PyTorch: Parameters to be optimized should be put into an iterable parameter then passed as a whole. The step
method is also implemented to perform one single step optimization and return loss.
MindSpore: The ways of the same learning rate for all parameters and different values for different parameter groups are supported.
Categories |
Subcategories |
TensorFlow |
MindSpore |
Differences |
---|---|---|---|---|
Parameters |
Parameter 1 |
learning_rate |
learning_rate |
- |
Parameter 2 |
initial_accumulator_value |
accum |
Same function, different parameter names |
|
Parameter 3 |
epsilon |
- |
TensorFlow is used to maintain numerical stability of small floating point values. MindSpore does not have this parameter |
|
Parameter 4 |
name |
- |
Not involved |
|
Parameter 5 |
**kwargs |
- |
Not involved |
|
Parameter 6 |
- |
params |
A list of parameters or a list of dictionaries, not available in TensorFlow |
|
Parameter 7 |
- |
update_slots |
If the value is True, the accumulator is updated. TensorFlow does not have this parameter |
|
Parameter 8 |
- |
loss_scale |
gradient scaling factor, default value: 1.0. TensorFlow does not have this parameter |
|
Parameter 9 |
- |
weight_decay |
weight decay (L2 penalty), default value: 0.0. TensorFlow does not have this parameter |
|
Input |
Single input |
- |
grads |
The gradient of |
Code Example
The two APIs basically achieve the same function.
# TensorFlow
import tensorflow as tf
opt = tf.keras.optimizers.Adagrad(initial_accumulator_value=0.1, learning_rate=0.1)
var = tf.Variable(1.0)
val0 = var.value()
loss = lambda: (var ** 2)/2.0
step_count = opt.minimize(loss, [var]).numpy()
val1 = var.value()
print([val1.numpy()])
# [0.9046537]
step_count = opt.minimize(loss, [var]).numpy()
val2 = var.value()
print([val2.numpy()])
# [0.8393387]
# MindSpore
import numpy as np
import mindspore.nn as nn
import mindspore as ms
from mindspore.dataset import NumpySlicesDataset
class Net(nn.Cell):
def __init__(self):
super(Net, self).__init__()
self.w = ms.Parameter(ms.Tensor(np.array([1.0], np.float32)), name='w')
def construct(self, x):
f = self.w * x
return f
class MyLoss(nn.LossBase):
def __init__(self, reduction='none'):
super(MyLoss, self).__init__()
def construct(self, y, y_pred):
return (y - y_pred) ** 2 / 2.0
net = Net()
loss = MyLoss()
optim = nn.Adagrad(params=net.trainable_params(), accum=0.1, learning_rate=0.1)
model = ms.Model(net, loss_fn=loss, optimizer=optim)
data_x = np.array([1.0], dtype=np.float32)
data_y = np.array([0.0], dtype=np.float32)
data = NumpySlicesDataset((data_x, data_y), ["x", "y"])
input_x = ms.Tensor(np.array([1.0], np.float32))
y0 = net(input_x)
model.train(1, data)
y1 = net(input_x)
print(y1)
# [0.9046537]
model.train(1, data)
y2 = net(input_x)
print(y2)
# [0.8393387]