Differences with torch.nn.LayerNorm
torch.nn.LayerNorm
class torch.nn.LayerNorm(
normalized_shape,
eps=1e-05,
elementwise_affine=True
)(input) -> Tensor
For more information, see torch.nn.LayerNorm.
mindspore.nn.LayerNorm
class mindspore.nn.LayerNorm(
normalized_shape,
begin_norm_axis=-1,
begin_params_axis=-1,
gamma_init='ones',
beta_init='zeros',
epsilon=1e-7
)(x) -> Tensor
For more information, see mindspore.nn.LayerNorm.
Differences
PyTorch: Layer Normalization is applied on the mini-batch input, where the parameter elementwise_affine
is used to control whether learnable parameters are used.
MindSpore: MindSpore API basically implements the same function as PyTorch, but there is no parameter elementwise_affine
in MindSpore, and the parameter begin_norm_axis
is added to control the axis of the normalized start calculation. The parameter begin_params_axis
controls the dimension of the first parameter (beta, gamma), and the parameters gamma_init
and beta_init
are used to control the initialization method of the gamma
and beta
parameters.
Categories |
Subcategories |
PyTorch |
MindSpore |
Difference |
---|---|---|---|---|
Parameters |
Parameter 1 |
normalized_shape |
normalized_shape |
PyTorch supports both int and list. However, in MindSpore, this parameter supports tuple and list |
Parameter 2 |
eps |
epsilon |
Same function, different parameter names, different default values |
|
Parameter 3 |
elementwise_affine |
- |
This parameter is used in PyTorch to control whether the learnable parameters are used. MindSpore does not have this parameter |
|
Parameter 4 |
- |
begin_norm_axis |
This parameter in MindSpore controls the axis on which the normalization begins. PyTorch does not have this parameter |
|
Parameter 5 |
- |
begin_params_axis |
This parameter in MindSpore controls the dimensionality of the first parameter (beta, gamma). PyTorch does not have this parameter |
|
Parameter 6 |
- |
gamma_init |
This parameter in MindSpore controls how the |
|
Parameter 7 |
- |
beta_init |
This parameter in MindSpore controls how the |
|
Input |
Single input |
input |
x |
Same function, different parameter names |
Code Example
When PyTorch’s parameter elementwise_affine is True, the two APIs achieve the same function and have the same usage.
# PyTorch
import torch
import torch.nn as nn
inputs = torch.ones([20, 5, 10, 10])
m = nn.LayerNorm(inputs.size()[1:])
output = m(inputs)
print(output.detach().numpy().shape)
# (20, 5, 10, 10)
# MindSpore
import mindspore
from mindspore import Tensor
import mindspore.numpy as np
import mindspore.nn as nn
x = Tensor(np.ones([20, 5, 10, 10]), mindspore.float32)
shape1 = x.shape[1:]
m = nn.LayerNorm(shape1, begin_norm_axis=1, begin_params_axis=1)
output = m(x).shape
print(output)
# (20, 5, 10, 10)
When the
num_features
parameter in PyTorch is of typeint
, in MindSpore it should be of typetuple(int)
# PyTorch
import torch
import torch.nn as nn
input_tensor = torch.randn(10, 20, 30)
layer_norm = nn.LayerNorm(normalized_shape=30)
output = layer_norm(input_tensor)
print("Output shape:", output.shape)
# Output shape: torch.Size([10, 20, 30])
# MindSpore
import mindspore
from mindspore import nn
input_tensor = mindspore.ops.randn(10, 20, 30)
layer_norm = nn.LayerNorm(normalized_shape=(30,))
output = layer_norm(input_tensor)
print("Output shape:", output.shape)
# Output shape: (10, 20, 30)