mindspore.experimental
The experimental modules.
Experimental Optimizer
API Name |
Description |
Supported Platforms |
Base class for all optimizers. |
|
|
Implements Adadelta algorithm. |
|
|
Implements Adagrad algorithm. |
|
|
Implements Adam algorithm. |
|
|
Implements Adamax algorithm (a variant of Adam based on infinity norm). |
|
|
Implements Adam Weight Decay algorithm. |
|
|
Implements Averaged Stochastic Gradient Descent algorithm. |
|
|
Implements NAdam algorithm. |
|
|
Implements RAdam algorithm. |
|
|
Implements RMSprop algorithm. |
|
|
Implements Rprop algorithm. |
|
|
Stochastic Gradient Descent optimizer. |
|
LRScheduler Class
The dynamic learning rates in this module are all subclasses of LRScheduler, this module should be used with optimizers in mindspore.experimental.optim, pass the optimizer instance to a LRScheduler when used. During the training process, the LRScheduler subclass dynamically changes the learning rate by calling the step method.
import mindspore
from mindspore import nn
from mindspore.experimental import optim
# Define the network structure of LeNet5. Refer to
# https://gitee.com/mindspore/docs/blob/r2.3.0rc2/docs/mindspore/code/lenet.py
net = LeNet5()
loss_fn = nn.SoftmaxCrossEntropyWithLogits(sparse=True)
optimizer = optim.Adam(net.trainable_params(), lr=0.05)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=2, gamma=0.1)
def forward_fn(data, label):
logits = net(data)
loss = loss_fn(logits, label)
return loss, logits
grad_fn = mindspore.value_and_grad(forward_fn, None, optimizer.parameters, has_aux=True)
def train_step(data, label):
(loss, _), grads = grad_fn(data, label)
optimizer(grads)
return loss
for epoch in range(6):
# Create the dataset taking MNIST as an example. Refer to
# https://gitee.com/mindspore/docs/blob/r2.3.0rc2/docs/mindspore/code/mnist.py
for data, label in create_dataset(need_download=False):
train_step(data, label)
scheduler.step()
API Name |
Description |
Supported Platforms |
Basic class of learning rate schedule. |
|
|
Decays the learning rate of each parameter group by a small constant factor until the number of epoch reaches a pre-defined milestone: total_iters. |
|
|
Set the learning rate of each parameter group using a cosine annealing lr schedule. |
|
|
|
Set the learning rate of each parameter group using a cosine annealing warm restarts schedule. |
|
Sets the learning rate of each parameter group according to cyclical learning rate policy (CLR). |
|
|
For each epoch, the learning rate decays exponentially, multiplied by gamma. |
|
|
Sets the learning rate of each parameter group to the initial lr times a given function. |
|
|
Decays the learning rate of each parameter group by linearly changing small multiplicative factor until the number of epoch reaches a pre-defined milestone: total_iters. |
|
|
Multiply the learning rate of each parameter group by the factor given in the specified function. |
|
|
Multiply the learning rate of each parameter group by gamma once the number of epoch reaches one of the milestones. |
|
|
For each epoch, the learning rate is adjusted by polynomial fitting. |
|
|
Reduce learning rate when a metric has stopped improving. |
|
|
Receives the list of schedulers that is expected to be called sequentially during optimization process and milestone points that provides exact intervals to reflect which scheduler is supposed to be called at a given epoch. |
|
|
Decays the learning rate of each parameter group by gamma every step_size epochs. |
|