mindspore.nn.TransformerEncoderLayer
- class mindspore.nn.TransformerEncoderLayer(d_model: int, nhead: int, dim_feedforward: int = 2048, dropout: float = 0.1, activation: Union[str, Cell, callable] = 'relu', layer_norm_eps: float = 1e-05, batch_first: bool = False, norm_first: bool = False)[source]
Transformer Encoder Layer. This is an implementation of the single layer of the transformer encoder layer, including multihead attention and feedward layer.
Warning
This is an experimental API that is subject to change or deletion.
- Parameters
d_model (int) – The number of features in the input tensor.
nhead (int) – The number of heads in the MultiheadAttention modules.
dim_feedforward (int) – The dimension of the feedforward layer. Default:
2048
.dropout (float) – The dropout value. Default:
0.1
.activation (Union[str, callable, Cell]) – The activation function of the intermediate layer, can be a string (
"relu"
or"gelu"
), Cell instance (nn.ReLU()
ornn.GELU()
) or a callable (ops.relu
orops.gelu
). Default:"relu"
.layer_norm_eps (float) – The epsilon value in LayerNorm modules. Default:
1e-5
.batch_first (bool) – If batch_first = True, then the shape of input and output tensors is \((batch, seq, feature)\) , otherwise the shape is \((seq, batch, feature)\) . Default:
False
.norm_first (bool) – If norm_first = True, layer norm is done prior to attention and feedforward operations, respectively. Default:
False
.
- Inputs:
src (Tensor): the sequence to the encoder layer.
src_mask (Tensor, optional): the mask for the src sequence. Default:
None
.src_key_padding_mask (Tensor, optional): the mask for the src keys per batch. Default:
None
.
- Outputs:
Tensor.
- Raises
ValueError – If the init argument activation is not str, callable or Cell instance.
ValueError – If the init argument activation is not
mindspore.nn.ReLU
,mindspore.nn.GELU
instance,mindspore.ops.relu()
,mindspore.ops.gelu()
instance, “relu” or “gelu” .
- Supported Platforms:
Ascend
GPU
CPU
Examples
>>> import mindspore as ms >>> import numpy as np >>> encoder_layer = ms.nn.TransformerEncoderLayer(d_model=512, nhead=8) >>> src = ms.Tensor(np.random.rand(10, 32, 512), ms.float32) >>> out = encoder_layer(src) >>> # Alternatively, when batch_first=True: >>> encoder_layer = ms.nn.TransformerEncoderLayer(d_model=512, nhead=8, batch_first=True) >>> src = ms.Tensor(np.random.rand(32, 10, 512), ms.float32) >>> out = encoder_layer(src) >>> print(out.shape) (32, 10, 512)