mindformers.core.CosineWithWarmUpLR
- class mindformers.core.CosineWithWarmUpLR(learning_rate: float, warmup_steps: int = 0, total_steps: int = None, num_cycles: float = 0.5, lr_end: float = 0., warmup_lr_init: float = 0., warmup_ratio: float = None, decay_steps: int = None, decay_ratio: float = None, **kwargs)[source]
Cosine with Warm Up Learning Rate.
The CosineWithWarmUpLR learning rate scheduler applies a cosine annealing schedule with warm-up steps to set the learning rate for each parameter group. Initially, the learning rate increases linearly during the warm-up phase, after which it follows a cosine function to decay.
During the warm-up phase, the learning rate increases from a small initial value to the base learning rate as follows:
where
is the initial learning rate, and is the learning rate after the warm-up phase.once the warm-up phase is completed, the learning rate follows a cosine decay schedule:
where
is the number of epochs since the end of the warm-up phase, and is the total number of epochs until the next restart.- Parameters
learning_rate (float) – Initial value of learning rate.
warmup_steps (int, optional) – The number of warm up steps. Default:
None
.total_steps (int, optional) – The number of total steps. Default:
None
.num_cycles (float, optional) – The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine). Default:
0.5
.lr_end (float, optional) – Final value of learning rate. Default:
0.
.warmup_lr_init (float, optional) – Initial learning rate in warm up steps. Default:
0.
.warmup_ratio (float, optional) – Ratio of total training steps used for warmup. Default:
None
.decay_steps (int, optional) – The number of decay steps. Default:
None
.decay_ratio (float, optional) – Ratio of total training steps used for decay. Default:
None
.
- Inputs:
global_step (Tensor) - The global step.
- Outputs:
Learning rate.
Examples
>>> import mindspore as ms >>> from mindformers.core import CosineWithWarmUpLR >>> >>> ms.set_context(mode=ms.GRAPH_MODE) >>> total_steps = 20 >>> warmup_steps = 10 >>> learning_rate = 0.005 >>> >>> cosine_warmup = CosineWithWarmUpLR(learning_rate=learning_rate, ... warmup_steps=warmup_steps, ... total_steps=total_steps) >>> print(cosine_warmup(Tensor(1))) 0.0005 >>> print(cosine_warmup(Tensor(15))) 0.0024999997