mindformers.core.ConstantWarmUpLR
- class mindformers.core.ConstantWarmUpLR(learning_rate: float, warmup_steps: int = None, warmup_lr_init: float = 0., warmup_ratio: float = None, total_steps: int = None, **kwargs)[source]
Constant Warm Up Learning Rate.
This learning rate strategy maintains a constant learning rate during the warm-up phase. It is particularly suitable for scenarios where a stable, lower learning rate is needed at the beginning of training to avoid issues such as gradient explosion, before transitioning to the main learning rate schedule.
During the warm-up phase, the learning rate is kept at a fixed value, denoted as \(\eta_{\text{warmup}}\) . The formula for the learning rate during the warm-up phase is:
\[\eta_t = \eta_{\text{warmup}}\]Here, \(\eta_{\text{warmup}}\) is the fixed learning rate applied during the warm-up steps, and \(t\) represents the current step.
After the warm-up phase concludes, the learning rate transitions to the main learning rate, denoted as \(\eta_{\text{main}}\) . The formula for the learning rate after the transition is:
\[\eta_t = \eta_{\text{main}}\]- Parameters
learning_rate (float) – Initial value of learning rate.
warmup_steps (int, optional) – The number of warm up steps. Default:
None
.warmup_lr_init (float, optional) – Initial learning rate in warm up steps. Default:
0.
.warmup_ratio (float, optional) – Ratio of total training steps used for warmup. Default:
None
.total_steps (int, optional) – The number of warm up steps. Default:
None
.
- Inputs:
global_step (int) - The global step.
- Outputs:
Learning rate.
Examples
>>> import mindspore as ms >>> from mindformers.core import ConstantWarmUpLR >>> >>> ms.set_context(mode=ms.GRAPH_MODE) >>> total_steps = 20 >>> warmup_steps = 10 >>> learning_rate = 0.005 >>> >>> constant_warmup = ConstantWarmUpLR(learning_rate=learning_rate, ... warmup_steps=warmup_steps, ... total_steps=total_steps) >>> print(constant_warmup(1)) 0.0005 >>> print(constant_warmup(15)) 0.005