Differences with torch.nn.LSTM

View Source On Gitee

torch.nn.LSTM

class torch.nn.LSTM(
    input_size,
    hidden_size,
    num_layers=1,
    bias=True,
    batch_first=False,
    dropout=0,
    bidirectional=False,
    proj_size=0)(input, (h_0, c_0)) -> Tensor

For more information, see torch.nn.LSTM.

mindspore.nn.LSTM

class mindspore.nn.LSTM(
    input_size,
    hidden_size,
    num_layers=1,
    has_bias=True,
    batch_first=False,
    dropout=0,
    bidirectional=False)(x, hx, seq_length) -> Tensor

For more information, see mindspore.nn.LSTM.

Differences

PyTorch: Compute the output sequence and final state based on the input sequence and the given initial state.

MindSpore: If the proj_size parameter in PyTorch is not specified, the MindSpore API achieves the same functionality as PyTorch, with only some of the parameter names being different.

Categories

Subcategories

PyTorch

MindSpore

Difference

Parameters

Parameter 1

input_size

input_size

-

Parameter 2

hidden_size

hidden_size

-

Parameter 3

num_layers

num_layers

-

Parameter 4

bias

has_bias

Same function, different parameter names

Parameter 5

batch_first

batch_first

-

Parameter 6

dropout

dropout

-

Parameter 7

bidirectional

bidirectional

-

Parameter 8

proj_size

-

In PyTorch, if proj_size>0, the hidden_size in the output shape will become proj_size, and the default value is 0. MindSpore does not have this parameter

Inputs

Input 1

input

x

Same function, different parameter names

Input 2

h_0

hx

In MindSpore hx represents a tuple of two Tensor(h_0, c_0), corresponding to inputs 2 and 3 in PyTorch, with the same function

Input 3

c_0

hx

In MindSpore hx represents a tuple of two Tensor(h_0, c_0), corresponding to inputs 2 and 3 in PyTorch, with the same function

Input 4

-

seq_length

This parameter in MindSpore specifies the sequence length of the input batch. PyTorch does not have this parameter

Code Example

When the parameter proj_size in PyTorch takes the default value of 0, the two APIs achieve the same function and have the same usage.

# PyTorch
import torch
from torch import tensor
import numpy as np

rnn = torch.nn.LSTM(10, 16, 2, bias=True, batch_first=True, bidirectional=False)
input1 = tensor(np.ones([3, 5, 10]), dtype=torch.float32)
h0 = tensor(np.ones([1 * 2, 3, 16]), dtype=torch.float32)
c0 = tensor(np.ones([1 * 2, 3, 16]), dtype=torch.float32)
output, (hn, cn) = rnn(input1, (h0, c0))
print(output.detach().numpy().shape)
# (3, 5, 16)
print(hn.detach().numpy().shape)
# (2, 3, 16)
print(cn.detach().numpy().shape)
# (2, 3, 16)

# MindSpore
import mindspore
from mindspore import Tensor
import numpy as np

net = mindspore.nn.LSTM(10, 16, 2, has_bias=True, batch_first=True, bidirectional=False)
x = Tensor(np.ones([3, 5, 10]).astype(np.float32))
h0 = Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
c0 = Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
output, (hn, cn) = net(x, (h0, c0))
print(output.shape)
# (3, 5, 16)
print(hn.shape)
# (2, 3, 16)
print(cn.shape)
# (2, 3, 16)