Differences with torch.nn.LayerNorm

View Source On Gitee

torch.nn.LayerNorm

class torch.nn.LayerNorm(
    normalized_shape,
    eps=1e-05,
    elementwise_affine=True
)(input) -> Tensor

For more information, see torch.nn.LayerNorm.

mindspore.nn.LayerNorm

class mindspore.nn.LayerNorm(
    normalized_shape,
    begin_norm_axis=-1,
    begin_params_axis=-1,
    gamma_init='ones',
    beta_init='zeros',
    epsilon=1e-7
)(x) -> Tensor

For more information, see mindspore.nn.LayerNorm.

Differences

PyTorch: Layer Normalization is applied on the mini-batch input, where the parameter elementwise_affine is used to control whether learnable parameters are used.

MindSpore: MindSpore API basically implements the same function as PyTorch, but there is no parameter elementwise_affine in MindSpore, and the parameter begin_norm_axis is added to control the axis of the normalized start calculation. The parameter begin_params_axis controls the dimension of the first parameter (beta, gamma), and the parameters gamma_init and beta_init are used to control the initialization method of the gamma and beta parameters.

Categories

Subcategories

PyTorch

MindSpore

Difference

Parameters

Parameter 1

normalized_shape

normalized_shape

PyTorch supports both int and list. However, in MindSpore, this parameter supports tuple and list

Parameter 2

eps

epsilon

Same function, different parameter names, different default values

Parameter 3

elementwise_affine

-

This parameter is used in PyTorch to control whether the learnable parameters are used. MindSpore does not have this parameter

Parameter 4

-

begin_norm_axis

This parameter in MindSpore controls the axis on which the normalization begins. PyTorch does not have this parameter

Parameter 5

-

begin_params_axis

This parameter in MindSpore controls the dimensionality of the first parameter (beta, gamma). PyTorch does not have this parameter

Parameter 6

-

gamma_init

This parameter in MindSpore controls how the γ parameter is initialized. PyTorch does not have this parameter

Parameter 7

-

beta_init

This parameter in MindSpore controls how the β parameter is initialized. PyTorch does not have this parameter

Input

Single input

input

x

Same function, different parameter names

Code Example

When PyTorch’s parameter elementwise_affine is True, the two APIs achieve the same function and have the same usage.

# PyTorch
import torch
import torch.nn as nn

inputs = torch.ones([20, 5, 10, 10])
m = nn.LayerNorm(inputs.size()[1:])
output = m(inputs)
print(output.detach().numpy().shape)
# (20, 5, 10, 10)

# MindSpore
import mindspore
from mindspore import Tensor
import mindspore.numpy as np
import mindspore.nn as nn

x = Tensor(np.ones([20, 5, 10, 10]), mindspore.float32)
shape1 = x.shape[1:]
m = nn.LayerNorm(shape1, begin_norm_axis=1, begin_params_axis=1)
output = m(x).shape
print(output)
# (20, 5, 10, 10)

When the num_features parameter in PyTorch is of type int, in MindSpore it should be of type tuple(int)

# PyTorch
import torch
import torch.nn as nn

input_tensor = torch.randn(10, 20, 30)
layer_norm = nn.LayerNorm(normalized_shape=30)
output = layer_norm(input_tensor)
print("Output shape:", output.shape)
# Output shape: torch.Size([10, 20, 30])

# MindSpore
import mindspore
from mindspore import nn

input_tensor = mindspore.ops.randn(10, 20, 30)
layer_norm = nn.LayerNorm(normalized_shape=(30,))
output = layer_norm(input_tensor)
print("Output shape:", output.shape)
# Output shape: (10, 20, 30)