Differences with torch.optim.Adam
torch.optim.Adam
class torch.optim.Adam(
params,
lr=0.001,
betas=(0.9, 0.999),
eps=1e-08,
weight_decay=0,
amsgrad=False
)
For more information, see torch.optim.Adam.
mindspore.nn.Adam
class mindspore.nn.Adam(
params,
learning_rate=1e-3,
beta1=0.9,
beta2=0.999,
eps=1e-8,
use_locking=False,
use_nesterov=False,
weight_decay=0.0,
loss_scale=1.0,
use_amsgrad=False,
**kwargs
)
For more information, see mindspore.nn.Adam.
Differences
mindspore.nn.Adam
can override the function of torch.optim.Adam
, and the function is the same with default parameters. The extra inputs in mindspore.nn.Adam
compared to PyTorch are used to control other functions. See the notes on the website for details.
Categories |
Subcategories |
PyTorch |
MindSpore |
Difference |
---|---|---|---|---|
Parameters |
Parameter 1 |
params |
params |
Consistent |
Parameter 2 |
lr |
learning_rate |
Same function, different parameter names |
|
Parameter 3 |
eps |
eps |
Consistent |
|
Parameter 4 |
weight_decay |
weight_decay |
Consistent |
|
Parameter 5 |
amsgrad |
use_amsgrad |
Same function, different parameter names |
|
Parameter 6 |
betas |
beta1, beta2 |
Same function, different parameter names |
|
Parameter 7 |
- |
use_locking |
MindSpore |
|
Parameter 8 |
- |
use_nesterov |
MindSpore |
|
Parameter 9 |
- |
loss_scale |
MindSpore |
|
Parameter 10 |
- |
kwargs |
The parameters “use_lazy” and “use_offload” passed into |
Code Example
# MindSpore
import mindspore
from mindspore import nn
net = nn.Dense(2, 3)
optimizer = nn.Adam(net.trainable_params())
criterion = nn.MAELoss(reduction="mean")
def forward_fn(data, label):
logits = net(data)
loss = criterion(logits, label)
return loss, logits
grad_fn = mindspore.value_and_grad(forward_fn, None, optimizer.parameters, has_aux=True)
def train_step(data, label):
(loss, _), grads = grad_fn(data, label)
optimizer(grads)
return loss
# PyTorch
import torch
model = torch.nn.Linear(2, 3)
criterion = torch.nn.L1Loss(reduction='mean')
optimizer = torch.optim.Adam(model.parameters())
def train_step(data, label):
optimizer.zero_grad()
output = model(data)
loss = criterion(output, label)
loss.backward()
optimizer.step()