mindformers.core.CosineWithRestartsAndWarmUpLR

View Source On Gitee
class mindformers.core.CosineWithRestartsAndWarmUpLR(learning_rate: float, warmup_steps: int = None, total_steps: int = None, num_cycles: float = 1., lr_end: float = 0., warmup_lr_init: float = 0., warmup_ratio: float = None, decay_steps: int = None, **kwargs)[source]

Cosine with Restarts and Warm Up Learning Rate.

The CosineWithRestartsAndWarmUpLR schedule sets the learning rate for each parameter group using a cosine annealing with restarts and warm-up, where \(\eta_{max}\) is set to the initial learning rate, and \(T_{cur}\) represents the number of steps since the last restart:

\[\begin{aligned} \eta_t & = \eta_{\text{min}} + \frac{1}{2}(\eta_{\text{max}} - \eta_{\text{min}})\left(1 + \cos\left(\frac{T_{cur}}{T_{i}}\pi\right)\right), & T_{cur} \neq (2k+1)T_{i}; \ \eta_{t+1} & = \eta_{\text{max}}, & T_{cur} = (2k+1)T_{i}. \end{aligned}\]

When last_epoch=-1, the initial learning rate is set to lr. During the restart phase, the learning rate begins anew from the maximum value and gradually decreases to the set minimum value. This strategy helps avoid getting trapped in local minima and accelerates convergence during training.

This method was proposed in SGDR: Stochastic Gradient Descent with Warm Restarts, extending the concept of cosine annealing to allow for multiple restarts.

Parameters
  • learning_rate (float) – Initial value of learning rate.

  • warmup_steps (int) – The number of warm up steps. Default: None.

  • total_steps (int) – The number of total steps. Default: None.

  • num_cycles (float) – The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine). Default: 1.0.

  • lr_end (float) – Final value of learning rate. Default: 0.

  • warmup_lr_init (float) – Initial learning rate in warm up steps. Default: 0.

  • warmup_ratio (float) – Ratio of total training steps used for warmup. Default: None.

  • decay_steps (int) – The number of decay steps. Default: None.

Inputs:
  • global_step (int) - The global step.

Outputs:

Learning rate.

Examples

>>> import mindspore as ms
>>> from mindformers.core import CosineWithRestartsAndWarmUpLR
>>>
>>> ms.set_context(mode=ms.GRAPH_MODE)
>>> total_steps = 20
>>> warmup_steps = 10
>>> learning_rate = 0.005
>>>
>>> cosine_warmup_restart = CosineWithRestartsAndWarmUpLR(learning_rate=learning_rate,
...                                                       warmup_steps=warmup_steps,
...                                                       total_steps=total_steps)
>>> print(cosine_warmup_restart(1))
0.0005
>>> print(cosine_warmup_restart(15))
0.0024999997