mindformers.core.LearningRateWiseLayer
- class mindformers.core.LearningRateWiseLayer(base_lr, lr_scale)[source]
Learning Rate Wise Layer.
This approach allows each layer to adapt its learning rate according to its specific needs, leading to more efficient and effective training. The learning rate for each layer is determined by a base learning rate modulated by a scaling factor specific to that layer.
Initially, the learning rate for each layer is set based on a linear scaling strategy:
where
is the learning rate for layer at time , is the base learning rate, and is the scaling factor for layer .As training progresses, the learning rate for each layer is adjusted according to the following cosine annealing schedule:
where
is the number of epochs completed since the learning rate was last reset, and is the total number of epochs before the next reset. represents the minimum learning rate at the end of the training.- Parameters
base_lr (mindspore.nn.learning_rate_schedule.LearningRateSchedule) – The base learning rate schedule.
lr_scale (float) – The value for learning rate scaling.
- Inputs:
global_step (Tensor) - The global step.
- Outputs:
Learning rate.
Examples
>>> import mindspore as ms >>> from mindformers.core import LinearWithWarmUpLR >>> from mindformers.core import LearningRateWiseLayer >>> >>> ms.set_context(mode=ms.GRAPH_MODE) >>> total_steps = 20 >>> warmup_steps = 10 >>> learning_rate = 0.005 >>> >>> linear_warmup = LinearWithWarmUpLR(learning_rate=learning_rate, ... warmup_steps=warmup_steps, ... total_steps=total_steps) >>> learning_rate_wise_layer = LearningRateWiseLayer(linear_warmup, 0.5) >>> print(learning_rate_wise_layer(Tensor(1))) 0.00025 >>> print(learning_rate_wise_layer(Tensor(15))) 0.00125