Function Differences with tf.keras.layers.LSTM

tf.keras.layers.LSTM

class tf.keras.layers.LSTM(
    units, activation='tanh', recurrent_activation='sigmoid',
    use_bias=True, kernel_initializer='glorot_uniform',
    recurrent_initializer='orthogonal',
    bias_initializer='zeros', unit_forget_bias=True,
    kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None,
    activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None,
    bias_constraint=None, dropout=0.0, recurrent_dropout=0.0,
    return_sequences=False, return_state=False, go_backwards=False, stateful=False,
    time_major=False, unroll=False, **kwargs)(inputs, mask, training, initial_state) -> Tensor

For more information, see tf.keras.layers.LSTM.

mindspore.nn.LSTM

class mindspore.nn.LSTM(
    input_size,
    hidden_size,
    num_layers=1,
    has_bias=True,
    batch_first=False,
    dropout=0,
    bidirectional=False)(x, hx, seq_length) -> Tensor

For more information, see mindspore.nn.LSTM.

Differences

TensorFlow: When the parameters return_sequences and return_state are set, the output sequence and final state can be calculated based on the input sequence.

MindSpore: MindSpore can compute output sequences and final states based on input sequences and given initial states, and can implement multi-layer and bi-directional LSTM networks. However, it is not possible to specify some functions (such as activation function, regularization function, constraint function) in the computation process like TensorFlow, and the API of TensorFlow can only implement one-way one-layer LSTM networks, so it will lead to different shapes of the final state tensor between the two APIs.

Categories	Subcategories	TensorFlow	MindSpore	Differences
Parameters	Parameter 1	units	hidden_size	Same function, different parameter names
	Parameter 2	activation	-	Specify the activation function to be used. Default value: tanh. MindSpore does not have this parameter, but the same activation function is used by default during the calculation
	Parameter 3	recurrent_activation	-	Specify the activation function used in the recursion step. Default value: sigmoid. MindSpore does not have this parameter, but the same activation function is used by default during the calculation
	Parameter 4	use_bias	has_bias	Same function, different parameter names
	Parameter 5	kernel_initializer	-	Initialize the kernel weight matrix for the linear transformation of the input. Default value: glorot_uniform. MindSpore does not have this parameter.
	Parameter 6	recurrent_initializer	-	Initialize the weight matrix of recurrent_kernel for linear transformation of recursive states. Default value: orthogonal. MindSpore does not have this parameter
	Parameter 7	bias_initializer	-	Initialize the bias vector. Default value: zeros. MindSpore does not have this parameter.
	Parameter 8	unit_forget_bias	-	Select whether to add 1 to the offset of the forget gate at initialization. Default value: True. MindSpore does not have this parameter.
	Parameter 9	kernel_regularizer	-	The regularization function applied to the kernel weight matrix. Default value: None. MindSpore does not have this parameter.
	Parameter 10	recurrent_regularizer	-	The regularization function applied to the recurrent_kernel weight matrix. Default value: None. MindSpore does not have this parameter.
	Parameter 11	bias_regularizer	-	The regularization function applied to the bias vector. Default value: None. MindSpore does not have this parameter.
	Parameter 12	activity_regularizer	-	The regularization function applied to the output of the activated layer. Default value: None. MindSpore does not have this parameter
	Parameter 13	kernel_constraint	-	Constraint function applied to the kernel weight matrix. Default value: None. MindSpore does not have this parameter
	Parameter 14	recurrent_constraint	-	Constraint function applied to the recurrent_kernel weight matrix. Default value: None. MindSpore does not have this parameter
	Parameter 15	bias_constraint	-	Constraint function applied to the weight vector. Default value: None. MindSpore does not have this parameter
	Parameter 16	dropout	dropout	-
	Parameter 17	recurrent_dropout	-	The dropout probability used in the recursive state. MindSpore uses dropout
	Parameter 18	return_sequences	-	Whether to return the last output in the output sequence or the complete sequence. Default value: False. MindSpore does not have this parameter, but defaults to True.
	Parameter 19	return_state	-	Whether to return the last state. Default value: False. MindSpore does not have this parameter, but defaults to True
	Parameter 20	go_backwards	-	Whether to reverse the input sequence and return the reverse sequence. Default value: False. MindSpore does not have this parameter.
	Parameter 21	stateful	-	Whether to use the last state of each sample at index i in the batch as the initial state of the samples at index i in the next batch. Default value: False. MindSpore does not have this parameter.
	Parameter 22	time_major	-	Selects the shape format of the input and output tensor. If True, the input and output will be [timesteps, batch, feature], while in the case of False, it will be [batch, timesteps, feature]. Default value: False. MindSpore does not have this parameter, but by default both shapes are possible
	Parameter 23	unroll	-	If True, the network will be expanded, otherwise a symbolic loop will be used. Default value: False. MindSpore does not have this parameter.
	Parameter 24	**kwargs	-	Not involved
	Parameter 25	inputs	x	Same function, different parameter names
	Parameter 26	mask	-	A binary tensor of the shape [batch, timesteps] indicating whether the given time step should be masked or not (optional, default is None). A single True entry indicates that the corresponding time step should be utilized, while a False entry indicates that the corresponding time step should be ignored. MindSpore does not have this parameter
	Parameter 27	training	-	Python bool indicating whether layer should be run in training mode or inference mode. This parameter is passed to the cell when the cell is called. This is only relevant when using dropout or recurrent_dropout (optional, default is None). MindSpore does not have this parameter
	Parameter 28	initial_state	hx	The initial state tensor list to be passed to the cell for the first call (optional, default is None, which will result in the creation of a zero-padded initial state tensor). The role in MindSpore is to give the initial state tensor.
	Parameter 29	-	input_size	Automatically determine the input size. TensorFlow does not have this parameter
	Parameter 30	-	num_layers	Set the number of network layers. Default value: 1. TensorFlow does not have this parameter
	Parameter 31	-	batch_first	The first dimension of the default input is batch_size, and TensorFlow does not have this parameter
	Parameter 32	-	bidirectional	The function is to set the bi-directional LSTM, and TensorFlow does not have this parameter
	Parameter 33	-	seq_length	Specify the sequence length of the input batch. TensorFlow does not have this parameter

Code Example

This API of TensorFlow generally defaults to a zero-padding tensor for the initial state tensor, so we can set MindSpore input state tensor to a zero tensor. In addition, TensorFlow API can only implement one layer of one-way LSTM network, and the shape of the output state is [batch_size, hidden_size], while the shape of MindSpore output state is [num_directions * num_layers, batch_size, hidden_size]. Therefore, we can take the default value of False for the parameter bidirectional of the MindSpore API, so that num_directions is 1. By taking the default value of 1 for the parameter num_layers as well, making the first dimension of MindSpore output state tensor shape 1, and then removing the first dimension with mindspore.ops.Squeeze, we can get the same result as TensorFlow API and achieve the same function.

# TensorFlow
import tensorflow as tf
import numpy as np

inputs = np.ones([3, 5, 10])
lstm = tf.keras.layers.LSTM(16, return_sequences=True, return_state=True)
whole_seq_output, final_memory_state, final_carry_state = lstm(inputs)
print(whole_seq_output.shape)
# (3, 5, 16)
print(final_memory_state.shape)
# (3, 16)
print(final_carry_state.shape)
# (3, 16)

# MindSpore
import mindspore
from mindspore import Tensor
import numpy as np

net = mindspore.nn.LSTM(10, 16, 1, has_bias=True, batch_first=True, bidirectional=False)
x = Tensor(np.ones([3, 5, 10]).astype(np.float32))
h0 = Tensor(np.zeros([1 * 1, 3, 16]).astype(np.float32))
c0 = Tensor(np.zeros([1 * 1, 3, 16]).astype(np.float32))
output, (hn, cn) = net(x, (h0, c0))
print(output.shape)
# (3, 5, 16)
squeeze = mindspore.ops.Squeeze(0)
hn_ = squeeze(hn)
print(hn_.shape)
# (3, 16)
cn_ = squeeze(cn)
print(cn_.shape)
# (3, 16)