mindspore.nn.LSTM

class mindspore.nn.LSTM(*args, **kwargs)[源代码]

长短期记忆（LSTM）网络，根据输入序列和给定的初始状态计算输出序列和最终状态。

在LSTM模型中，有两条管道连接两个连续的Cell，一条是Cell状态管道，另一条是隐藏状态管道。将两个连续的时间节点表示为 $t - 1$ 和 $t$ 。指定在 $t$ 时刻输入 $x_{t}$ ，在 $t - 1$ 时刻的隐藏状态 $h_{t - 1}$ 和Cell状态 $c_{t - 1}$ 。在 $t$ 时刻的Cell状态和隐藏状态使用门控机制计算得到。输入门 $i_{t}$ 计算出候选值。遗忘门 $f_{t}$ 决定是否让 $h_{t - 1}$ 学到的信息通过或部分通过。输出门 $o_{t}$ 决定哪些信息输出。候选Cell状态 ${\tilde{c}}_{t}$ 是用当前输入计算的。最后，使用遗忘门、输入门、输出门计算得到当前时刻的Cell状态 $c_{t}$ 和隐藏状态 $h_{t}$ 。完整的公式如下。

\begin{array}{r} \begin{array}{ll} i_{t} = σ (W_{i x} x_{t} + b_{i x} + W_{i h} h_{(t - 1)} + b_{i h}) \\ f_{t} = σ (W_{f x} x_{t} + b_{f x} + W_{f h} h_{(t - 1)} + b_{f h}) \\ {\tilde{c}}_{t} = \tanh (W_{c x} x_{t} + b_{c x} + W_{c h} h_{(t - 1)} + b_{c h}) \\ o_{t} = σ (W_{o x} x_{t} + b_{o x} + W_{o h} h_{(t - 1)} + b_{o h}) \\ c_{t} = f_{t} * c_{(t - 1)} + i_{t} * {\tilde{c}}_{t} \\ h_{t} = o_{t} * \tanh (c_{t}) \end{array} \end{array}

其中 $σ$ 是sigmoid激活函数， $*$ 是乘积。 $W, b$ 是公式中输出和输入之间的可学习权重。例如， $W_{i x}, b_{i x}$ 是用于从输入 $x$ 转换为 $i$ 的权重和偏置。

详细信息可见论文 LONG SHORT-TERM MEMORY 和 Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling 。

LSTM隐藏了整个循环神经网络在序列时间步(Time step)上的循环，送入输入序列、初始状态，即可获得每个时间步的隐藏状态(hidden state)拼接而成的矩阵，以及最后一个时间步对应的隐状态。我们使用最后的一个时间步的隐藏状态作为输入句子的编码特征，送入下一层。公式为：

h_{0 : n}, (h_{n}, c_{n}) = L S T M (x_{0 : n}, (h_{0}, c_{0}))

参数：

input_size (int) - 输入的大小。
hidden_size (int) - 隐藏状态大小。
num_layers (int) - 网络层数。默认值： 1 。
has_bias (bool) - Cell是否有偏置项 $b_{i h}$ 和 $b_{f h}$ 。默认值： True 。
batch_first (bool) - 指定输入 x 的第一个维度是否为batch_size。默认值： False 。
dropout (float, int) - 指的是除第一层外每层输入时的dropout概率。默认值： 0 。dropout的范围为[0.0, 1.0)。
bidirectional (bool) - 是否为双向LSTM。如果 bidirectional 是 True，directions数量为2，否则directions数量为1。默认值： False 。
dtype (mindspore.dtype) - Parameters的dtype。默认值： mstype.float32 。

输入：

x (Tensor) - shape为 $(s e q_l e n, b a t c h_s i z e, i n p u t_s i z e)$ 或 $(b a t c h_s i z e, s e q_l e n, i n p u t_s i z e)$ 的Tensor。
hx (tuple) - 两个Tensor(h_0,c_0)的元组，数据类型为mindspore.float32或mindspore.float16，shape为 $(n u m_d i r e c t i o n s * n u m_l a y e r s, b a t c h_s i z e, h i d d e n_s i z e)$ 。
seq_length (Tensor) - 输入batch的序列长度。Tensor的shape 为 $(b a t c h_s i z e)$ 。默认：None。这里输入指明真实的序列长度，以避免使用填充后的元素计算隐藏状态，影响最后的输出。推荐这种输入方法。

输出：

Tuple，包含 (output, (h_n, c_n))的元组。

output (Tensor) - shape为 $(s e q_l e n, b a t c h_s i z e, n u m_d i r e c t i o n s * h i d d e n_s i z e)$ 的Tensor。
hx_n (tuple) - 两个Tensor (h_n, c_n)的元组，shape都是 $(n u m_d i r e c t i o n s * n u m_l a y e r s, b a t c h_s i z e, h i d d e n_s i z e)$ 。

异常：

TypeError - input_size， hidden_size 或 num_layers 不是int。
TypeError - has_bias ， batch_first 或 bidirectional 不是bool。
TypeError - dropout 既不是float也不是int。
ValueError - dropout 不在[0.0, 1.0)范围内。

支持平台：

Ascend GPU CPU

样例：

>>> import mindspore as ms
>>> import numpy as np
>>> net = ms.nn.LSTM(10, 16, 2, has_bias=True, batch_first=True, bidirectional=False)
>>> x = ms.Tensor(np.ones([3, 5, 10]).astype(np.float32))
>>> h0 = ms.Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> c0 = ms.Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> output, (hn, cn) = net(x, (h0, c0))
>>> print(output.shape)
(3, 5, 16)