mindspore.nn.LSTMCell
- class mindspore.nn.LSTMCell(input_size, hidden_size, has_bias=True, batch_first=False, dropout=0, bidirectional=False)[source]
LSTM (Long Short-Term Memory) layer.
Apply LSTM layer to the input.
There are two pipelines connecting two consecutive cells in a LSTM model; one is cell state pipeline and the other is hidden state pipeline. Denote two consecutive time nodes as \(t-1\) and \(t\). Given an input \(x_t\) at time \(t\), an hidden state \(h_{t-1}\) and an cell state \(c_{t-1}\) of the layer at time \({t-1}\), the cell state and hidden state at time \(t\) is computed using an gating mechanism. Input gate \(i_t\) is designed to protect the cell from perturbation by irrelevant inputs. Forget gate \(f_t\) affords protection of the cell by forgetting some information in the past, which is stored in \(h_{t-1}\). Output gate \(o_t\) protects other units from perturbation by currently irrelevant memory contents. Candidate cell state \(\tilde{c}_t\) is calculated with the current input, on which the input gate will be applied. Finally, current cell state \(c_{t}\) and hidden state \(h_{t}\) are computed with the calculated gates and cell states. The complete formulation is as follows.
\[\begin{split}\begin{array}{ll} \\ i_t = \sigma(W_{ix} x_t + b_{ix} + W_{ih} h_{(t-1)} + b_{ih}) \\ f_t = \sigma(W_{fx} x_t + b_{fx} + W_{fh} h_{(t-1)} + b_{fh}) \\ \tilde{c}_t = \tanh(W_{cx} x_t + b_{cx} + W_{ch} h_{(t-1)} + b_{ch}) \\ o_t = \sigma(W_{ox} x_t + b_{ox} + W_{oh} h_{(t-1)} + b_{oh}) \\ c_t = f_t * c_{(t-1)} + i_t * \tilde{c}_t \\ h_t = o_t * \tanh(c_t) \\ \end{array}\end{split}\]Here \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. \(W, b\) are learnable weights between the output and the input in the formula. For instance, \(W_{ix}, b_{ix}\) are the weight and bias used to transform from input \(x\) to \(i\). Details can be found in paper LONG SHORT-TERM MEMORY and Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling.
Note
LSTMCell is a single-layer RNN, you can achieve multi-layer RNN by stacking LSTMCell.
- Parameters
input_size (int) – Number of features of input.
hidden_size (int) – Number of features of hidden layer.
has_bias (bool) – Whether the cell has bias b_ih and b_hh. Default: True.
batch_first (bool) – Specifies whether the first dimension of input x is batch_size. Default: False.
dropout (float, int) – If not 0, append Dropout layer on the outputs of each LSTM layer except the last layer. Default 0. The range of dropout is [0.0, 1.0].
bidirectional (bool) – Specifies whether this is a bidirectional LSTM. If set True, number of directions will be 2 otherwise number of directions is 1. Default: False.
- Inputs:
x (Tensor) - Tensor of shape (seq_len, batch_size, input_size).
h - data type mindspore.float32 or mindspore.float16 and shape (num_directions, batch_size, hidden_size).
c - data type mindspore.float32 or mindspore.float16 and shape (num_directions, batch_size, hidden_size). Data type of h’ and ‘c’ must be the same of `x.
w - data type mindspore.float32 or mindspore.float16 and shape (weight_size, 1, 1). The value of weight_size depends on input_size, hidden_size and bidirectional
- Outputs:
output, h_n, c_n, ‘reserve’, ‘state’.
output (Tensor) - Tensor of shape (seq_len, batch_size, num_directions * hidden_size).
h - A Tensor with shape (num_directions, batch_size, hidden_size).
c - A Tensor with shape (num_directions, batch_size, hidden_size).
reserve - reserved
state - reserved
- Raises
TypeError – If input_size or hidden_size or num_layers is not an int.
TypeError – If has_bias or batch_first or bidirectional is not a bool.
TypeError – If dropout is neither a float nor an int.
ValueError – If dropout is not in range [0.0, 1.0].
- Supported Platforms:
GPU
CPU
Examples
>>> net = nn.LSTMCell(10, 12, has_bias=True, batch_first=True, bidirectional=False) >>> x = Tensor(np.ones([3, 5, 10]).astype(np.float32)) >>> h = Tensor(np.ones([1, 3, 12]).astype(np.float32)) >>> c = Tensor(np.ones([1, 3, 12]).astype(np.float32)) >>> w = Tensor(np.ones([1152, 1, 1]).astype(np.float32)) >>> output, h, c, _, _ = net(x, h, c, w) >>> print(output.shape) (3, 5, 12)