Document feedback

Question document fragment

When a question document fragment contains a formula, it is displayed as a space.

Submission type

issue

It's a little complicated...

I'd like to ask someone.

PR

Just a small problem.

I can fix it online!

Please select the submission type

Problem type

Specifications and Common Mistakes

- Specifications and Common Mistakes:

- Misspellings or punctuation mistakes,incorrect formulas, abnormal display.

- Incorrect links, empty cells, or wrong formats.

- Chinese characters in English context.

- Minor inconsistencies between the UI and descriptions.

- Low writing fluency that does not affect understanding.

- Incorrect version numbers, including software package names and version numbers on the UI.

Usability

- Usability:

- Incorrect or missing key steps.

- Missing main function descriptions, keyword explanation, necessary prerequisites, or precautions.

- Ambiguous descriptions, unclear reference, or contradictory context.

- Unclear logic, such as missing classifications, items, and steps.

Correctness

- Correctness:

- Technical principles, function descriptions, supported platforms, parameter types, or exceptions inconsistent with that of software implementation.

- Incorrect schematic or architecture diagrams.

- Incorrect commands or command parameters.

- Incorrect code.

- Commands inconsistent with the functions.

- Wrong screenshots.

- Sample code running error, or running results inconsistent with the expectation.

Risk Warnings

- Risk Warnings:

- Lack of risk warnings for operations that may damage the system or important data.

Content Compliance

- Content Compliance:

- Contents that may violate applicable laws and regulations or geo-cultural context-sensitive words and expressions.

- Copyright infringement.

Please select the type of question

Problem description

Describe the bug so that we can quickly locate the problem.

Document feedback

mindspore.nn.LSTM

class mindspore.nn.LSTM(*args, **kwargs)[source]

Stacked LSTM (Long Short-Term Memory) layers.

Apply LSTM layer to the input.

There are two pipelines connecting two consecutive cells in a LSTM model; one is cell state pipeline and the other is hidden state pipeline. Denote two consecutive time nodes as $t - 1$ and $t$ . Given an input $x_{t}$ at time $t$ , an hidden state $h_{t - 1}$ and an cell state $c_{t - 1}$ of the layer at time $t - 1$ , the cell state and hidden state at time $t$ is computed using an gating mechanism. Input gate $i_{t}$ is designed to protect the cell from perturbation by irrelevant inputs. Forget gate $f_{t}$ affords protection of the cell by forgetting some information in the past, which is stored in $h_{t - 1}$ . Output gate $o_{t}$ protects other units from perturbation by currently irrelevant memory contents. Candidate cell state ${\tilde{c}}_{t}$ is calculated with the current input, on which the input gate will be applied. Finally, current cell state $c_{t}$ and hidden state $h_{t}$ are computed with the calculated gates and cell states. The complete formulation is as follows.

\begin{array}{r} \begin{array}{ll} i_{t} = σ (W_{i x} x_{t} + b_{i x} + W_{i h} h_{(t - 1)} + b_{i h}) \\ f_{t} = σ (W_{f x} x_{t} + b_{f x} + W_{f h} h_{(t - 1)} + b_{f h}) \\ {\tilde{c}}_{t} = \tanh (W_{c x} x_{t} + b_{c x} + W_{c h} h_{(t - 1)} + b_{c h}) \\ o_{t} = σ (W_{o x} x_{t} + b_{o x} + W_{o h} h_{(t - 1)} + b_{o h}) \\ c_{t} = f_{t} * c_{(t - 1)} + i_{t} * {\tilde{c}}_{t} \\ h_{t} = o_{t} * \tanh (c_{t}) \end{array} \end{array}

Here $σ$ is the sigmoid function, and $*$ is the Hadamard product. $W, b$ are learnable weights between the output and the input in the formula. For instance, $W_{i x}, b_{i x}$ are the weight and bias used to transform from input $x$ to $i$ . Details can be found in paper LONG SHORT-TERM MEMORY and Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling.

LSTM hides the cycle of the whole cyclic neural network on the time step of the sequence, and input the sequence and initial state to obtain the matrix spliced by the hidden state of each time step and the hidden state of the last time step. We use the hidden state of the last time step as the coding feature of the input sentence and output it to the next layer.

h_{0 : n}, (h_{n}, c_{n}) = L S T M (x_{0 : n}, (h_{0}, c_{0}))

Parameters

input_size (int) – Number of features of input.
hidden_size (int) – Number of features of hidden layer.
num_layers (int) – Number of layers of stacked LSTM . Default: 1 .
has_bias (bool) – Whether the cell has bias $b_{i h}$ and $b_{h h}$ . Default: True .
batch_first (bool) – Specifies whether the first dimension of input x is batch_size. Default: False .
dropout (float, int) – If not 0, append Dropout layer on the outputs of each LSTM layer except the last layer. Default 0 . The range of dropout is [0.0, 1.0).
bidirectional (bool) – Specifies whether it is a bidirectional LSTM, num_directions=2 if bidirectional=True otherwise 1. Default: False .
dtype (mindspore.dtype) – Dtype of Parameters. Default: mstype.float32 .

Inputs:

x (Tensor) - Tensor of data type mindspore.float32 or mindspore.float16 and shape $(s e q_l e n, b a t c h_s i z e, i n p u t_s i z e)$ or $(b a t c h_s i z e, s e q_l e n, i n p u t_s i z e)$ .
hx (tuple) - A tuple of two Tensors (h_0, c_0) both of data type mindspore.float32 or mindspore.float16 and shape $(n u m_d i r e c t i o n s * n u m_l a y e r s, b a t c h_s i z e, h i d d e n_s i z e)$ .
seq_length (Tensor) - The length of each sequence in an input batch. Tensor of shape $(b a t c h_s i z e)$ . Default: None . This input indicates the real sequence length before padding to avoid padded elements have been used to compute hidden state and affect the final output. It is recommended to use this input when x has padding elements.

Outputs:

Tuple, a tuple contains (output, (h_n, c_n)).

output (Tensor) - Tensor of shape $(s e q_l e n, b a t c h_s i z e, n u m_d i r e c t i o n s * h i d d e n_s i z e)$ .
hx_n (tuple) - A tuple of two Tensor (h_n, c_n) both of shape $(n u m_d i r e c t i o n s * n u m_l a y e r s, b a t c h_s i z e, h i d d e n_s i z e)$ .

Raises

TypeError – If input_size, hidden_size or num_layers is not an int.
TypeError – If has_bias, batch_first or bidirectional is not a bool.
TypeError – If dropout is not a float.
ValueError – If dropout is not in range [0.0, 1.0).

Supported Platforms:: Ascend GPU CPU

Examples

>>> import mindspore as ms
>>> import numpy as np
>>> net = ms.nn.LSTM(10, 16, 2, has_bias=True, batch_first=True, bidirectional=False)
>>> x = ms.Tensor(np.ones([3, 5, 10]).astype(np.float32))
>>> h0 = ms.Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> c0 = ms.Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> output, (hn, cn) = net(x, (h0, c0))
>>> print(output.shape)
(3, 5, 16)