Differences between torch.nn.TransformerEncoder and mindspore.nn.TransformerEncoder
torch.nn.TransformerEncoder
class torch.nn.TransformerEncoder(
encoder_layer,
num_layers,
norm=None
)(src, mask=None, src_key_padding_mask=None)
For more information, see torch.nn.TransformerEncoder.
mindspore.nn.TransformerEncoder
class mindspore.nn.TransformerEncoder(
encoder_layer,
num_layers,
norm=None
)(src, src_mask=None, src_key_padding_mask=None)
For more information, see mindspore.nn.TransformerEncoder.
Differences
The code implementation and parameter update logic of mindspore.nn.TransformerEncoder
optimizer is mostly the same with torch.nn.TransformerEncoder
.
Categories |
Subcategories |
PyTorch |
MindSpore |
Difference |
---|---|---|---|---|
Parameters |
Parameter 1 |
encoder_layer |
encoder_layer |
Consistent function |
Parameter 2 |
num_layers |
num_layers |
Consistent function |
|
Parameter 3 |
norm |
norm |
Consistent function |
|
Input |
Input1 |
src |
src |
Consistent function |
Input2 |
mask |
src_mask |
Consistent function, different parameter names |
|
Input3 |
src_key_padding_mask |
src_key_padding_mask |
In MindSpore, dtype can be set as float or bool Tensor; in PyTorch dtype can be set as byte or bool Tensor. |
Code Example
# PyTorch
import torch
from torch import nn
encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8)
transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=6)
src = torch.rand(10, 32, 512)
out = transformer_encoder(src)
print(out.shape)
#torch.Size([10, 32, 512])
# MindSpore
import mindspore
from mindspore import nn
encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8)
transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=6)
src = mindspore.numpy.rand(10, 32, 512)
out = transformer_encoder(src)
print(out.shape)
#(10, 32, 512)