mindformers.core.CosineAnnealingWarmRestarts
- class mindformers.core.CosineAnnealingWarmRestarts(base_lr: float, t_0: int, t_mult: int = 1, eta_min: float = 0., **kwargs)[source]
Set the learning rate of each parameter group using a cosine annealing schedule, where \(\eta_{max}\) is set to the initial lr, \(T_{cur}\) is the number of epochs since the last restart and \(T_{i}\) is the number of epochs between two warm restarts in SGDR:
\[\eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 + \cos\left(\frac{T_{cur}}{T_{i}}\pi\right)\right)\]When \(T_{cur}=T_{i}\), set \(\eta_t = \eta_{min}\). When \(T_{cur}=0\) after restart, set \(\eta_t=\eta_{max}\).
It has been proposed in SGDR: Stochastic Gradient Descent with Warm Restarts .
- Parameters
- Inputs:
global_step (int) - The global step.
- Outputs:
Learning rate.
Examples
>>> import mindspore as ms >>> from mindformers.core import CosineAnnealingWarmRestarts >>> >>> ms.set_context(mode=ms.GRAPH_MODE) >>> base_lr = 0.005 >>> t_0 = 10 >>> t_mult = 2 >>> eta_min = 0.0000001 >>> >>> cosine_annealing_restart = CosineAnnealingWarmRestarts(base_lr=base_lr, ... t_0=t_0, ... t_mult=t_mult, ... eta_min=eta_min) >>> print(cosine_annealing_restart(1)) 0.0048776437 >>> print(cosine_annealing_restart(15)) 0.0042677815