mindspore.nn.GRU

class mindspore.nn.GRU(*args, **kwargs)[source]

Stacked GRU (Gated Recurrent Unit) layers.

Apply GRU layer to the input.

There are two gates in a GRU model. One is update gate and the other is reset gate. Denote two consecutive time nodes as \(t-1\) and \(t\). Given an input \(x_t\) at time \(t\), a hidden state \(h_{t-1}\), the update and reset gate at time \(t\) is computed using a gating mechanism. Update gate \(z_t\) is designed to protect the cell from perturbation by irrelevant inputs and past hidden state. Reset gate \(r_t\) determines how much information should be reset from old hidden state. New memory state \(n_t\) is calculated with the current input, on which the reset gate will be applied. Finally, current hidden state \(h_{t}\) is computed with the calculated update grate and new memory state. The complete formulation is as follows:

\[\begin{split}\begin{array}{ll} r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ z_t = \sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\ h_t = (1 - z_t) * n_t + z_t * h_{(t-1)} \end{array}\end{split}\]

Here \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. \(W, b\) are learnable weights between the output and the input in the formula. For instance, \(W_{ir}, b_{ir}\) are the weight and bias used to transform from input \(x\) to \(r\). Details can be found in paper Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation.

Note

When using GRU on Ascend, the hidden size only supports multiples of 16.

Parameters
  • input_size (int) – Number of features of input.

  • hidden_size (int) – Number of features of hidden layer.

  • num_layers (int) – Number of layers of stacked GRU. Default: 1.

  • has_bias (bool) – Whether the cell has bias b_in and b_hn. Default: True.

  • batch_first (bool) – Specifies whether the first dimension of input x is batch_size. Default: False.

  • dropout (float) – If not 0.0, append Dropout layer on the outputs of each GRU layer except the last layer. Default 0.0. The range of dropout is [0.0, 1.0).

  • bidirectional (bool) – Specifies whether it is a bidirectional GRU, num_directions=2 if bidirectional=True otherwise 1. Default: False.

Inputs:
  • x (Tensor) - Tensor of data type mindspore.float32 or mindspore.float16 and shape (seq_len, batch_size, input_size) or (batch_size, seq_len, input_size).

  • hx (Tensor) - Tensor of data type mindspore.float32 or mindspore.float16 and shape (num_directions * num_layers, batch_size, hidden_size). The data type of hx must be the same as x.

  • seq_length (Tensor) - The length of each sequence in an input batch. Tensor of shape \((\text{batch_size})\). Default: None. This input indicates the real sequence length before padding to avoid padded elements have been used to compute hidden state and affect the final output. It is recommended to use this input when x has padding elements.

Outputs:

Tuple, a tuple contains (output, h_n).

  • output (Tensor) - Tensor of shape (seq_len, batch_size, num_directions * hidden_size) or (batch_size, seq_len, num_directions * hidden_size).

  • hx_n (Tensor) - Tensor of shape (num_directions * num_layers, batch_size, hidden_size).

Raises
  • TypeError – If input_size, hidden_size or num_layers is not an int.

  • TypeError – If has_bias, batch_first or bidirectional is not a bool.

  • TypeError – If dropout is not a float.

  • ValueError – If dropout is not in range [0.0, 1.0).

Supported Platforms:

Ascend GPU CPU

Examples

>>> net = nn.GRU(10, 16, 2, has_bias=True, batch_first=True, bidirectional=False)
>>> x = Tensor(np.ones([3, 5, 10]).astype(np.float32))
>>> h0 = Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> output, hn = net(x, h0)
>>> print(output.shape)
(3, 5, 16)