Document feedback

Question document fragment

When a question document fragment contains a formula, it is displayed as a space.

Submission type

issue

It's a little complicated...

I'd like to ask someone.

PR

Just a small problem.

I can fix it online!

Please select the submission type

Problem type

Specifications and Common Mistakes

- Specifications and Common Mistakes:

- Misspellings or punctuation mistakes,incorrect formulas, abnormal display.

- Incorrect links, empty cells, or wrong formats.

- Chinese characters in English context.

- Minor inconsistencies between the UI and descriptions.

- Low writing fluency that does not affect understanding.

- Incorrect version numbers, including software package names and version numbers on the UI.

Usability

- Usability:

- Incorrect or missing key steps.

- Missing main function descriptions, keyword explanation, necessary prerequisites, or precautions.

- Ambiguous descriptions, unclear reference, or contradictory context.

- Unclear logic, such as missing classifications, items, and steps.

Correctness

- Correctness:

- Technical principles, function descriptions, supported platforms, parameter types, or exceptions inconsistent with that of software implementation.

- Incorrect schematic or architecture diagrams.

- Incorrect commands or command parameters.

- Incorrect code.

- Commands inconsistent with the functions.

- Wrong screenshots.

- Sample code running error, or running results inconsistent with the expectation.

Risk Warnings

- Risk Warnings:

- Lack of risk warnings for operations that may damage the system or important data.

Content Compliance

- Content Compliance:

- Contents that may violate applicable laws and regulations or geo-cultural context-sensitive words and expressions.

- Copyright infringement.

Please select the type of question

Problem description

Describe the bug so that we can quickly locate the problem.

Document feedback

mindformers.core.CosineWithWarmUpLR

class mindformers.core.CosineWithWarmUpLR(learning_rate: float, warmup_steps: int = 0, total_steps: int = None, num_cycles: float = 0.5, lr_end: float = 0., warmup_lr_init: float = 0., warmup_ratio: float = None, decay_steps: int = None, decay_ratio: float = None, **kwargs)[source]

Cosine with Warm Up Learning Rate.

The CosineWithWarmUpLR learning rate scheduler applies a cosine annealing schedule with warm-up steps to set the learning rate for each parameter group. Initially, the learning rate increases linearly during the warm-up phase, after which it follows a cosine function to decay.

During the warm-up phase, the learning rate increases from a small initial value to the base learning rate as follows:

η_{t} = η_{warmup} + t \times \frac{η_{base} - η_{warmup}}{warmup_steps}

where $η_{warmup}$ is the initial learning rate, and $η_{base}$ is the learning rate after the warm-up phase.

once the warm-up phase is completed, the learning rate follows a cosine decay schedule:

η_{t} = η_{end} + \frac{1}{2} (η_{base} - η_{end}) (1 + \cos (\frac{t_{c u r}}{t_{m a x}} π))

where $t_{c u r}$ is the number of epochs since the end of the warm-up phase, and $t_{m a x}$ is the total number of epochs until the next restart.

Parameters

learning_rate (float) – Initial value of learning rate.
warmup_steps (int, optional) – The number of warm up steps. Default: None.
total_steps (int, optional) – The number of total steps. Default: None.
num_cycles (float, optional) – The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine). Default: 0.5.
lr_end (float, optional) – Final value of learning rate. Default: 0..
warmup_lr_init (float, optional) – Initial learning rate in warm up steps. Default: 0..
warmup_ratio (float, optional) – Ratio of total training steps used for warmup. Default: None.
decay_steps (int, optional) – The number of decay steps. Default: None.
decay_ratio (float, optional) – Ratio of total training steps used for decay. Default: None.

Inputs:

global_step (Tensor) - The global step.

Outputs:

Learning rate.

Examples

>>> import mindspore as ms
>>> from mindformers.core import CosineWithWarmUpLR
>>>
>>> ms.set_context(mode=ms.GRAPH_MODE)
>>> total_steps = 20
>>> warmup_steps = 10
>>> learning_rate = 0.005
>>>
>>> cosine_warmup = CosineWithWarmUpLR(learning_rate=learning_rate,
...                                    warmup_steps=warmup_steps,
...                                    total_steps=total_steps)
>>> print(cosine_warmup(Tensor(1)))
0.0005
>>> print(cosine_warmup(Tensor(15)))
0.0024999997