Differences between torch.nn.TransformerDecoder and mindspore.nn.TransformerDecoder
torch.nn.TransformerDecoder
class torch.nn.TransformerDecoder(
decoder_layer,
num_layers,
norm=None
)(tgt, memory, tgt_mask=None, memory_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None)
For more information, see torch.nn.TransformerDecoder.
mindspore.nn.TransformerDecoder
class mindspore.nn.TransformerDecoder(
decoder_layer,
num_layers,
norm=None
)(tgt, memory, tgt_mask=None, memory_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None)
For more information, see mindspore.nn.TransformerDecoder.
Differences
The code implementation and parameter update logic of mindspore.nn.TransformerDecoder
optimizer is mostly the same with torch.nn.TransformerDecoder
.
Categories |
Subcategories |
PyTorch |
MindSpore |
Difference |
---|---|---|---|---|
Parameters |
Parameter 1 |
decoder_layer |
decoder_layer |
Consistent function |
Parameter 2 |
num_layers |
num_layers |
Consistent function |
|
Parameter 3 |
norm |
norm |
Consistent function |
|
Input |
Input 1 |
tgt |
tgt |
Consistent function |
Input 2 |
memory |
memory |
Consistent function |
|
Input 3 |
tgt_mask |
tgt_mask |
In MindSpore, dtype can be set as float or bool Tensor; in Pytorch dtype can be set as float, byte or bool Tensor. |
|
Input 4 |
memory_mask |
memory_mask |
In MindSpore, dtype can be set as float or bool Tensor; in Pytorch dtype can be set as float, byte or bool Tensor. |
|
Input 5 |
tgt_key_padding_mask |
tgt_key_padding_mask |
In MindSpore, dtype can be set as float or bool Tensor; in Pytorch dtype can be set as byte or bool Tensor. |
|
Input 6 |
memory_key_padding_mask |
memory_key_padding_mask |
In MindSpore, dtype can be set as float or bool Tensor; in Pytorch dtype can be set as byte or bool Tensor. |
Code Example
# PyTorch
import torch
from torch import nn
decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8)
transformer_decoder = nn.TransformerDecoder(decoder_layer, num_layers=6)
memory = torch.rand(10, 32, 512)
tgt = torch.rand(20, 32, 512)
out = transformer_decoder(tgt, memory)
print(out.shape)
#torch.Size([20, 32, 512])
# MindSpore
import mindspore as ms
import numpy as np
decoder_layer = ms.nn.TransformerDecoderLayer(d_model=512, nhead=8)
transformer_decoder = ms.nn.TransformerDecoder(decoder_layer, num_layers=6)
memory = ms.Tensor(np.random.rand(10, 32, 512), ms.float32)
tgt = ms.Tensor(np.random.rand(20, 32, 512), ms.float32)
out = transformer_decoder(tgt, memory)
print(out.shape)
#(20, 32, 512)