Function Differences with tf.compat.v1.train.cosine_decay

tf.compat.v1.train.cosine_decay

tf.compat.v1.train.cosine_decay(
    learning_rate,
    global_step,
    decay_steps,
    alpha=0.0,
    name=None
) -> Tensor

For more information, see tf.compat.v1.train.cosine_decay.

mindspore.nn.CosineDecayLR

class mindspore.nn.CosineDecayLR(
    min_lr,
    max_lr,
    decay_steps
)(global_step) -> Tensor

For more information, see mindspore.nn.CosineDecayLR.

Differences

TensorFlow: The learning rate is calculated based on the cosine decay function.

MindSpore: This API achieves basically the same function as TensorFlow’s. After MindSpore max_lr is fixed to 1, TensorFlow outputs the decayed learning rate, while MindSpore outputs the decayed rate. That is, the MindSpore output is multiplied by the same learning_rate as TensorFlow, and the two yield the same result.

Categories	Subcategories	TensorFlow	MindSpore	Differences
Parameters	Parameter 1	learning_rate	-	Initial learning rate. MindSpore does not have this parameter
	Parameter 2	global_step	global_step	-
	Parameter 3	decay_steps	decay_steps	-
	Parameter 4	alpha	min_lr	Same function, different parameter names
	Parameter 5	name	-	Not involved
	Parameter 6	-	max_lr	The maximum value of learning rate. TensorFlow doesn’t have this parameter

Code Example

The max_lr of MindSpore is fixed to 1, and its output is multiplied by the same learning_rate as TensorFlow. The two APIs achieve the same function.

# TensorFlow
# TensorFlow
import tensorflow as tf

tf.compat.v1.disable_eager_execution()
learning_rate = 0.01
global_steps = 2
decay_steps = 4
output = tf.compat.v1.train.cosine_decay(learning_rate, global_steps, decay_steps)
ss = tf.compat.v1.Session()
print(ss.run(output))
# 0.0049999994

# MindSpore
import mindspore
from mindspore import Tensor, nn

min_lr = 0.01
max_lr = 1.0
decay_steps = 4
global_steps = Tensor(2, mindspore.int32)
cosine_decay_lr = nn.CosineDecayLR(min_lr, max_lr, decay_steps)
output = cosine_decay_lr(global_steps)
adapted_output = output * 0.01
print(adapted_output)
# 0.00505