Differences between torch.nn.TransformerDecoderLayer and mindspore.nn.TransformerDecoderLayer
torch.nn.TransformerDecoderLayer
class torch.nn.TransformerDecoderLayer(
d_model,
nhead,
dim_feedforward=2048,
dropout=0.1,
activation='relu'
)(tgt, memory, tgt_mask=None, memory_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None)
For more information, see torch.nn.TransformerDecoderLayer.
mindspore.nn.TransformerDecoderLayer
class mindspore.nn.TransformerDecoderLayer(
d_model,
nhead,
dim_feedforward=2048,
dropout=0.1,
activation='relu',
layer_norm_eps=1e-5,
batch_first=False,
norm_first=False,
dtype=mstype.float32
)(tgt, memory, tgt_mask=None, memory_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None)
For more information, see mindspore.nn.TransformerDecoderLayer.
Differences
The code implementation and parameter update logic of mindspore.nn.TransformerDecoderLayer
optimizer is mostly the same with torch.nn.TransformerDecoderLayer
.
Categories |
Subcategories |
PyTorch |
MindSpore |
Difference |
---|---|---|---|---|
Parameters |
Parameter 1 |
d_model |
d_model |
Consistent function |
Parameter 2 |
nhead |
nhead |
Consistent function |
|
Parameter 3 |
dim_feedforward |
dim_feedforward |
Consistent function |
|
Parameter 4 |
dropout |
dropout |
Consistent function |
|
Parameter 5 |
activation |
activation |
Consistent function |
|
Parameter 6 |
layer_norm_eps |
In MindSpore, the value of eps can be set in LayerNorm, PyTorch does not have this function |
||
Parameter 7 |
batch_first |
In MindSpore, first batch can be set as batch dimension, PyTorch does not have this function |
||
Parameter 8 |
norm_first |
In MindSpore, LayerNorm can be set in between Multiheadttention Layer and FeedForward Layer or after, PyTorch does not have this function |
||
Parameter 9 |
dtype |
In MindSpore, dtype can be set in Parameters using ‘dtype’. PyTorch does not have this function. |
||
Input |
Input 1 |
tgt |
tgt |
Consistent function |
Input 2 |
memory |
memory |
Consistent function |
|
Input 3 |
tgt_mask |
tgt_mask |
In MindSpore, dtype can be set as float or bool Tensor; in PyTorch dtype can be set as float, byte or bool Tensor. |
|
Input 4 |
memory_mask |
memory_mask |
In MindSpore, dtype can be set as float or bool Tensor; in PyTorch dtype can be set as float, byte or bool Tensor. |
|
Input 5 |
tgt_key_padding_mask |
tgt_key_padding_mask |
In MindSpore, dtype can be set as float or bool Tensor; in PyTorch dtype can be set as byte or bool Tensor. |
|
Input 6 |
memory_key_padding_mask |
memory_key_padding_mask |
In MindSpore, dtype can be set as float or bool Tensor; in PyTorch dtype can be set as byte or bool Tensor. |
Code Example
# PyTorch
import torch
from torch import nn
decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8)
transformer_decoder = nn.TransformerDecoder(decoder_layer, num_layers=6)
memory = torch.rand(10, 32, 512)
tgt = torch.rand(20, 32, 512)
out = transformer_decoder(tgt, memory)
print(out.shape)
#torch.Size([20, 32, 512])
# MindSpore
import mindspore as ms
import numpy as np
decoder_layer = ms.nn.TransformerDecoderLayer(d_model=512, nhead=8)
transformer_decoder = ms.nn.TransformerDecoder(decoder_layer, num_layers=6)
memory = ms.Tensor(np.random.rand(10, 32, 512), ms.float32)
tgt = ms.Tensor(np.random.rand(20, 32, 512), ms.float32)
out = transformer_decoder(tgt, memory)
print(out.shape)
#(20, 32, 512)