mindspore.nn.LSTM

class mindspore.nn.LSTM(input_size, hidden_size, num_layers=1, has_bias=True, batch_first=False, dropout=0, bidirectional=False)[source]

Stacked LSTM (Long Short-Term Memory) layers.

Apply LSTM layer to the input.

There are two pipelines connecting two consecutive cells in a LSTM model; one is cell state pipeline and the other is hidden state pipeline. Denote two consecutive time nodes as \(t-1\) and \(t\). Given an input \(x_t\) at time \(t\), an hidden state \(h_{t-1}\) and an cell state \(c_{t-1}\) of the layer at time \({t-1}\), the cell state and hidden state at time \(t\) is computed using an gating mechanism. Input gate \(i_t\) is designed to protect the cell from perturbation by irrelevant inputs. Forget gate \(f_t\) affords protection of the cell by forgetting some information in the past, which is stored in \(h_{t-1}\). Output gate \(o_t\) protects other units from perturbation by currently irrelevant memory contents. Candidate cell state \(\tilde{c}_t\) is calculated with the current input, on which the input gate will be applied. Finally, current cell state \(c_{t}\) and hidden state \(h_{t}\) are computed with the calculated gates and cell states. The complete formulation is as follows.

\[\begin{split}\begin{array}{ll} \\ i_t = \sigma(W_{ix} x_t + b_{ix} + W_{ih} h_{(t-1)} + b_{ih}) \\ f_t = \sigma(W_{fx} x_t + b_{fx} + W_{fh} h_{(t-1)} + b_{fh}) \\ \tilde{c}_t = \tanh(W_{cx} x_t + b_{cx} + W_{ch} h_{(t-1)} + b_{ch}) \\ o_t = \sigma(W_{ox} x_t + b_{ox} + W_{oh} h_{(t-1)} + b_{oh}) \\ c_t = f_t * c_{(t-1)} + i_t * \tilde{c}_t \\ h_t = o_t * \tanh(c_t) \\ \end{array}\end{split}\]

Here \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. \(W, b\) are learnable weights between the output and the input in the formula. For instance, \(W_{ix}, b_{ix}\) are the weight and bias used to transform from input \(x\) to \(i\). Details can be found in paper LONG SHORT-TERM MEMORY and Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling.

Parameters

input_size (int) – Number of features of input.
hidden_size (int) – Number of features of hidden layer.
num_layers (int) – Number of layers of stacked LSTM . Default: 1.
has_bias (bool) – Whether the cell has bias b_ih and b_hh. Default: True.
batch_first (bool) – Specifies whether the first dimension of input is batch_size. Default: False.
dropout (float, int) – If not 0, append Dropout layer on the outputs of each LSTM layer except the last layer. Default 0. The range of dropout is [0.0, 1.0].
bidirectional (bool) – Specifies whether it is a bidirectional LSTM. Default: False.

Inputs:

input (Tensor) - Tensor of shape (seq_len, batch_size, input_size) or (batch_size, seq_len, input_size).
hx (tuple) - A tuple of two Tensors (h_0, c_0) both of data type mindspore.float32 or mindspore.float16 and shape (num_directions * num_layers, batch_size, hidden_size). Data type of hx must be the same as input.

Outputs:

Tuple, a tuple contains (output, (h_n, c_n)).

output (Tensor) - Tensor of shape (seq_len, batch_size, num_directions * hidden_size).
hx_n (tuple) - A tuple of two Tensor (h_n, c_n) both of shape (num_directions * num_layers, batch_size, hidden_size).

Raises

TypeError – If input_size, hidden_size or num_layers is not an int.
TypeError – If has_bias, batch_first or bidirectional is not a bool.
TypeError – If dropout is neither a float nor an int.
ValueError – If dropout is not in range [0.0, 1.0].

Supported Platforms:: Ascend GPU

Examples

>>> net = nn.LSTM(10, 16, 2, has_bias=True, batch_first=True, bidirectional=False)
>>> input = Tensor(np.ones([3, 5, 10]).astype(np.float32))
>>> h0 = Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> c0 = Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> output, (hn, cn) = net(input, (h0, c0))
>>> print(output.shape)
(3, 5, 16)