mindformers.core.ConstantWarmUpLR

class mindformers.core.ConstantWarmUpLR(learning_rate: float, warmup_steps: int = None, warmup_lr_init: float = 0., warmup_ratio: float = None, total_steps: int = None, **kwargs)[source]

Constant Warm Up Learning Rate.

This learning rate strategy maintains a constant learning rate during the warm-up phase. It is particularly suitable for scenarios where a stable, lower learning rate is needed at the beginning of training to avoid issues such as gradient explosion, before transitioning to the main learning rate schedule.

During the warm-up phase, the learning rate is kept at a fixed value, denoted as $η_{warmup}$ . The formula for the learning rate during the warm-up phase is:

η_{t} = η_{warmup}

Here, $η_{warmup}$ is the fixed learning rate applied during the warm-up steps, and $t$ represents the current step.

After the warm-up phase concludes, the learning rate transitions to the main learning rate, denoted as $η_{main}$ . The formula for the learning rate after the transition is:

η_{t} = η_{main}

Parameters

learning_rate (float) – Initial value of learning rate.
warmup_steps (int, optional) – The number of warm up steps. Default: None.
warmup_lr_init (float, optional) – Initial learning rate in warm up steps. Default: 0..
warmup_ratio (float, optional) – Ratio of total training steps used for warmup. Default: None.
total_steps (int, optional) – The number of warm up steps. Default: None.

Inputs:

global_step (Tensor) - The global step.

Outputs:

Learning rate.

Examples

>>> import mindspore as ms
>>> from mindformers.core import ConstantWarmUpLR
>>>
>>> ms.set_context(mode=ms.GRAPH_MODE)
>>> total_steps = 20
>>> warmup_steps = 10
>>> learning_rate = 0.005
>>>
>>> constant_warmup = ConstantWarmUpLR(learning_rate=learning_rate,
...                                    warmup_steps=warmup_steps,
...                                    total_steps=total_steps)
>>> print(constant_warmup(Tensor(1)))
0.0005
>>> print(constant_warmup(Tensor(15)))
0.005