mindspore.nn.TransformerEncoder
- class mindspore.nn.TransformerEncoder(encoder_layer, num_layers, norm=None)[source]
Transformer Encoder module with multi-layer stacked of
mindspore.nn.TransformerEncoderLayer
, including multihead attention and feedforward layer. Users can build the BERT(https://arxiv.org/abs/1810.04805) model with corresponding parameters.- Parameters
encoder_layer (Cell) – An instance of the
mindspore.nn.TransformerEncoderLayer
class.num_layers (int) – The number of encoder-layers in the encoder.
norm (Cell, optional) – The layer normalization module. Default:
None
.
- Inputs:
src (Tensor) - The sequence to the encoder. For unbatched input, the shape is \((S, E)\) ; otherwise if batch_first=False in
mindspore.nn.TransformerEncoderLayer
, the shape is \((S, N, E)\) and if batch_first=True , the shape is \((N, S, E)\), where \((S)\) is the source sequence length, \((N)\) is the batch number and \((E)\) is the feature number. Supported types: float16, float32, float64.src_mask (Tensor, optional) - The mask of the src sequence. The shape is \((S, S)\) or \((N*nhead, S, S)\) , where nhead is the arguent in
mindspore.nn.TransformerEncoderLayer
. Supported types: float16, float32, float64, bool. Default:None
.src_key_padding_mask (Tensor, optional) - the mask of the src keys per batch. The shape is \((S)\) for unbatched input, otherwise \((N, S)\) . Supported types: float16, float32, float64, bool. Default:
None
.
- Outputs:
Tensor. The shape and dtype of Tensor is the same with src .
- Raises
AssertionError – If the input argument src_key_padding_mask is not bool or floating types.
- Supported Platforms:
Ascend
GPU
CPU
Examples
>>> import mindspore as ms >>> import numpy as np >>> encoder_layer = ms.nn.TransformerEncoderLayer(d_model=512, nhead=8) >>> transformer_encoder = ms.nn.TransformerEncoder(encoder_layer, num_layers=6) >>> src = ms.Tensor(np.random.rand(10, 32, 512), ms.float32) >>> out = transformer_encoder(src) >>> print(out.shape) (10, 32, 512)