Function Differences with tf.keras.layers.LSTM
tf.keras.layers.LSTM
class tf.keras.layers.LSTM(
units, activation='tanh', recurrent_activation='sigmoid',
use_bias=True, kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal',
bias_initializer='zeros', unit_forget_bias=True,
kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None,
activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None,
bias_constraint=None, dropout=0.0, recurrent_dropout=0.0,
return_sequences=False, return_state=False, go_backwards=False, stateful=False,
time_major=False, unroll=False, **kwargs)(inputs, mask, training, initial_state) -> Tensor
For more information, see tf.keras.layers.LSTM.
mindspore.nn.LSTM
class mindspore.nn.LSTM(
input_size,
hidden_size,
num_layers=1,
has_bias=True,
batch_first=False,
dropout=0,
bidirectional=False)(x, hx, seq_length) -> Tensor
For more information, see mindspore.nn.LSTM.
Differences
TensorFlow: When the parameters return_sequences and return_state are set, the output sequence and final state can be calculated based on the input sequence.
MindSpore: MindSpore can compute output sequences and final states based on input sequences and given initial states, and can implement multi-layer and bi-directional LSTM networks. However, it is not possible to specify some functions (such as activation function, regularization function, constraint function) in the computation process like TensorFlow, and the API of TensorFlow can only implement one-way one-layer LSTM networks, so it will lead to different shapes of the final state tensor between the two APIs.
Categories |
Subcategories |
TensorFlow |
MindSpore |
Differences |
---|---|---|---|---|
Parameters |
Parameter 1 |
units |
hidden_size |
Same function, different parameter names |
Parameter 2 |
activation |
- |
Specify the activation function to be used. Default value: tanh. MindSpore does not have this parameter, but the same activation function is used by default during the calculation |
|
Parameter 3 |
recurrent_activation |
- |
Specify the activation function used in the recursion step. Default value: sigmoid. MindSpore does not have this parameter, but the same activation function is used by default during the calculation |
|
Parameter 4 |
use_bias |
has_bias |
Same function, different parameter names |
|
Parameter 5 |
kernel_initializer |
- |
Initialize the kernel weight matrix for the linear transformation of the input. Default value: glorot_uniform. MindSpore does not have this parameter. |
|
Parameter 6 |
recurrent_initializer |
- |
Initialize the weight matrix of recurrent_kernel for linear transformation of recursive states. Default value: orthogonal. MindSpore does not have this parameter |
|
Parameter 7 |
bias_initializer |
- |
Initialize the bias vector. Default value: zeros. MindSpore does not have this parameter. |
|
Parameter 8 |
unit_forget_bias |
- |
Select whether to add 1 to the offset of the forget gate at initialization. Default value: True. MindSpore does not have this parameter. |
|
Parameter 9 |
kernel_regularizer |
- |
The regularization function applied to the kernel weight matrix. Default value: None. MindSpore does not have this parameter. |
|
Parameter 10 |
recurrent_regularizer |
- |
The regularization function applied to the recurrent_kernel weight matrix. Default value: None. MindSpore does not have this parameter. |
|
Parameter 11 |
bias_regularizer |
- |
The regularization function applied to the bias vector. Default value: None. MindSpore does not have this parameter. |
|
Parameter 12 |
activity_regularizer |
- |
The regularization function applied to the output of the activated layer. Default value: None. MindSpore does not have this parameter |
|
Parameter 13 |
kernel_constraint |
- |
Constraint function applied to the kernel weight matrix. Default value: None. MindSpore does not have this parameter |
|
Parameter 14 |
recurrent_constraint |
- |
Constraint function applied to the recurrent_kernel weight matrix. Default value: None. MindSpore does not have this parameter |
|
Parameter 15 |
bias_constraint |
- |
Constraint function applied to the weight vector. Default value: None. MindSpore does not have this parameter |
|
Parameter 16 |
dropout |
dropout |
- |
|
Parameter 17 |
recurrent_dropout |
- |
The dropout probability used in the recursive state. MindSpore uses dropout |
|
Parameter 18 |
return_sequences |
- |
Whether to return the last output in the output sequence or the complete sequence. Default value: False. MindSpore does not have this parameter, but defaults to True. |
|
Parameter 19 |
return_state |
- |
Whether to return the last state. Default value: False. MindSpore does not have this parameter, but defaults to True |
|
Parameter 20 |
go_backwards |
- |
Whether to reverse the input sequence and return the reverse sequence. Default value: False. MindSpore does not have this parameter. |
|
Parameter 21 |
stateful |
- |
Whether to use the last state of each sample at index i in the batch as the initial state of the samples at index i in the next batch. Default value: False. MindSpore does not have this parameter. |
|
Parameter 22 |
time_major |
- |
Selects the shape format of the input and output tensor. If True, the input and output will be [timesteps, batch, feature], while in the case of False, it will be [batch, timesteps, feature]. Default value: False. MindSpore does not have this parameter, but by default both shapes are possible |
|
Parameter 23 |
unroll |
- |
If True, the network will be expanded, otherwise a symbolic loop will be used. Default value: False. MindSpore does not have this parameter. |
|
Parameter 24 |
**kwargs |
- |
Not involved |
|
Parameter 25 |
inputs |
x |
Same function, different parameter names |
|
Parameter 26 |
mask |
- |
A binary tensor of the shape [batch, timesteps] indicating whether the given time step should be masked or not (optional, default is None). A single True entry indicates that the corresponding time step should be utilized, while a False entry indicates that the corresponding time step should be ignored. MindSpore does not have this parameter |
|
Parameter 27 |
training |
- |
Python bool indicating whether layer should be run in training mode or inference mode. This parameter is passed to the cell when the cell is called. This is only relevant when using dropout or recurrent_dropout (optional, default is None). MindSpore does not have this parameter |
|
Parameter 28 |
initial_state |
hx |
The initial state tensor list to be passed to the cell for the first call (optional, default is None, which will result in the creation of a zero-padded initial state tensor). The role in MindSpore is to give the initial state tensor. |
|
Parameter 29 |
- |
input_size |
Automatically determine the input size. TensorFlow does not have this parameter |
|
Parameter 30 |
- |
num_layers |
Set the number of network layers. Default value: 1. TensorFlow does not have this parameter |
|
Parameter 31 |
- |
batch_first |
The first dimension of the default input is batch_size, and TensorFlow does not have this parameter |
|
Parameter 32 |
- |
bidirectional |
The function is to set the bi-directional LSTM, and TensorFlow does not have this parameter |
|
Parameter 33 |
- |
seq_length |
Specify the sequence length of the input batch. TensorFlow does not have this parameter |
Code Example
This API of TensorFlow generally defaults to a zero-padding tensor for the initial state tensor, so we can set MindSpore input state tensor to a zero tensor. In addition, TensorFlow API can only implement one layer of one-way LSTM network, and the shape of the output state is [batch_size, hidden_size], while the shape of MindSpore output state is [num_directions * num_layers, batch_size, hidden_size]. Therefore, we can take the default value of False for the parameter bidirectional of the MindSpore API, so that num_directions is 1. By taking the default value of 1 for the parameter num_layers as well, making the first dimension of MindSpore output state tensor shape 1, and then removing the first dimension with mindspore.ops.Squeeze, we can get the same result as TensorFlow API and achieve the same function.
# TensorFlow
import tensorflow as tf
import numpy as np
inputs = np.ones([3, 5, 10])
lstm = tf.keras.layers.LSTM(16, return_sequences=True, return_state=True)
whole_seq_output, final_memory_state, final_carry_state = lstm(inputs)
print(whole_seq_output.shape)
# (3, 5, 16)
print(final_memory_state.shape)
# (3, 16)
print(final_carry_state.shape)
# (3, 16)
# MindSpore
import mindspore
from mindspore import Tensor
import numpy as np
net = mindspore.nn.LSTM(10, 16, 1, has_bias=True, batch_first=True, bidirectional=False)
x = Tensor(np.ones([3, 5, 10]).astype(np.float32))
h0 = Tensor(np.zeros([1 * 1, 3, 16]).astype(np.float32))
c0 = Tensor(np.zeros([1 * 1, 3, 16]).astype(np.float32))
output, (hn, cn) = net(x, (h0, c0))
print(output.shape)
# (3, 5, 16)
squeeze = mindspore.ops.Squeeze(0)
hn_ = squeeze(hn)
print(hn_.shape)
# (3, 16)
cn_ = squeeze(cn)
print(cn_.shape)
# (3, 16)