mindspore.nn.LSTM

class mindspore.nn.LSTM(*args, **kwargs)[source]

Stacked LSTM (Long Short-Term Memory) layers.

Apply LSTM layer to the input.

There are two pipelines connecting two consecutive cells in a LSTM model; one is cell state pipeline and the other is hidden state pipeline. Denote two consecutive time nodes as t1 and t. Given an input xt at time t, an hidden state ht1 and an cell state ct1 of the layer at time t1, the cell state and hidden state at time t is computed using an gating mechanism. Input gate it is designed to protect the cell from perturbation by irrelevant inputs. Forget gate ft affords protection of the cell by forgetting some information in the past, which is stored in ht1. Output gate ot protects other units from perturbation by currently irrelevant memory contents. Candidate cell state c~t is calculated with the current input, on which the input gate will be applied. Finally, current cell state ct and hidden state ht are computed with the calculated gates and cell states. The complete formulation is as follows.

it=σ(Wixxt+bix+Wihh(t1)+bih)ft=σ(Wfxxt+bfx+Wfhh(t1)+bfh)c~t=tanh(Wcxxt+bcx+Wchh(t1)+bch)ot=σ(Woxxt+box+Wohh(t1)+boh)ct=ftc(t1)+itc~tht=ottanh(ct)

Here σ is the sigmoid function, and is the Hadamard product. W,b are learnable weights between the output and the input in the formula. For instance, Wix,bix are the weight and bias used to transform from input x to i. Details can be found in paper LONG SHORT-TERM MEMORY and Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling.

Parameters
  • input_size (int) – Number of features of input.

  • hidden_size (int) – Number of features of hidden layer.

  • num_layers (int) – Number of layers of stacked LSTM . Default: 1.

  • has_bias (bool) – Whether the cell has bias b_ih and b_hh. Default: True.

  • batch_first (bool) – Specifies whether the first dimension of input x is batch_size. Default: False.

  • dropout (float, int) – If not 0, append Dropout layer on the outputs of each LSTM layer except the last layer. Default 0. The range of dropout is [0.0, 1.0).

  • bidirectional (bool) – Specifies whether it is a bidirectional LSTM, num_directions=2 if bidirectional=True otherwise 1. Default: False.

Inputs:
  • x (Tensor) - Tensor of data type mindspore.float32 or mindspore.float16 and shape (seq_len, batch_size, input_size) or (batch_size, seq_len, input_size).

  • hx (tuple) - A tuple of two Tensors (h_0, c_0) both of data type mindspore.float32 or mindspore.float16 and shape (num_directions * num_layers, batch_size, hidden_size). The data type of hx must be the same as x.

  • seq_length (Tensor) - The length of each sequence in an input batch. Tensor of shape (batch_size). Default: None. This input indicates the real sequence length before padding to avoid padded elements have been used to compute hidden state and affect the final output. It is recommended to use this input when x has padding elements.

Outputs:

Tuple, a tuple contains (output, (h_n, c_n)).

  • output (Tensor) - Tensor of shape (seq_len, batch_size, num_directions * hidden_size).

  • hx_n (tuple) - A tuple of two Tensor (h_n, c_n) both of shape (num_directions * num_layers, batch_size, hidden_size).

Raises
  • TypeError – If input_size, hidden_size or num_layers is not an int.

  • TypeError – If has_bias, batch_first or bidirectional is not a bool.

  • TypeError – If dropout is not a float.

  • ValueError – If dropout is not in range [0.0, 1.0).

Supported Platforms:

Ascend GPU CPU

Examples

>>> net = nn.LSTM(10, 16, 2, has_bias=True, batch_first=True, bidirectional=False)
>>> x = Tensor(np.ones([3, 5, 10]).astype(np.float32))
>>> h0 = Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> c0 = Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> output, (hn, cn) = net(x, (h0, c0))
>>> print(output.shape)
(3, 5, 16)