Function Differences with tf.compat.v1.train.cosine_decay
tf.compat.v1.train.cosine_decay
tf.compat.v1.train.cosine_decay(
learning_rate,
global_step,
decay_steps,
alpha=0.0,
name=None
) -> Tensor
For more information, see tf.compat.v1.train.cosine_decay.
mindspore.nn.CosineDecayLR
class mindspore.nn.CosineDecayLR(
min_lr,
max_lr,
decay_steps
)(global_step) -> Tensor
For more information, see mindspore.nn.CosineDecayLR.
Differences
TensorFlow: The learning rate is calculated based on the cosine decay function.
MindSpore: This API achieves basically the same function as TensorFlow’s. After MindSpore max_lr is fixed to 1, TensorFlow outputs the decayed learning rate, while MindSpore outputs the decayed rate. That is, the MindSpore output is multiplied by the same learning_rate as TensorFlow, and the two yield the same result.
Categories |
Subcategories |
TensorFlow |
MindSpore |
Differences |
---|---|---|---|---|
Parameters |
Parameter 1 |
learning_rate |
- |
Initial learning rate. MindSpore does not have this parameter |
Parameter 2 |
global_step |
global_step |
- |
|
Parameter 3 |
decay_steps |
decay_steps |
- |
|
Parameter 4 |
alpha |
min_lr |
Same function, different parameter names |
|
Parameter 5 |
name |
- |
Not involved |
|
Parameter 6 |
- |
max_lr |
The maximum value of learning rate. TensorFlow doesn’t have this parameter |
Code Example
The max_lr of MindSpore is fixed to 1, and its output is multiplied by the same learning_rate as TensorFlow. The two APIs achieve the same function.
# TensorFlow
# TensorFlow
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
learning_rate = 0.01
global_steps = 2
decay_steps = 4
output = tf.compat.v1.train.cosine_decay(learning_rate, global_steps, decay_steps)
ss = tf.compat.v1.Session()
print(ss.run(output))
# 0.0049999994
# MindSpore
import mindspore
from mindspore import Tensor, nn
min_lr = 0.01
max_lr = 1.0
decay_steps = 4
global_steps = Tensor(2, mindspore.int32)
cosine_decay_lr = nn.CosineDecayLR(min_lr, max_lr, decay_steps)
output = cosine_decay_lr(global_steps)
adapted_output = output * 0.01
print(adapted_output)
# 0.00505