mindspore.nn
Neural Network Cell
For building predefined building blocks or computational units in neural networks.
For more information about dynamic shape support status, please refer to Dynamic Shape Support Status of nn Interface .
Compared with the previous version, the added, deleted and supported platforms change information of mindspore.nn operators in MindSpore, please refer to the link mindspore.nn API Interface Change .
Basic Block
API Name |
Description |
Supported Platforms |
The basic building block of neural networks in MindSpore. |
|
|
Base class for running the graph loaded from a MindIR. |
|
|
Base class for other losses. |
|
|
Base class for updating parameters. |
|
Container
API Name |
Description |
Supported Platforms |
Holds Cells in a dictionary. |
|
|
Holds Cells in a list. |
|
|
Sequential Cell container. |
|
Wrapper Layer
API Name |
Description |
Supported Platforms |
A distributed optimizer. |
|
|
Dynamic Loss scale update cell. |
|
|
Update cell with fixed loss scaling value. |
|
|
Encapsulate training network. |
|
|
Cell to run for getting the next operation. |
|
|
Wrap the network with Micro Batch to enable the grad accumulation in semi_auto_parallel/auto_parallel mode. |
|
|
This function splits the input at the 0th into interleave_num pieces and then performs the computation of the wrapped cell. |
|
|
Cell that updates parameter. |
|
|
Slice MiniBatch into finer-grained MicroBatch for use in pipeline-parallel training. |
|
|
PipelineGradReducer is a gradient reducer for pipeline parallelism. |
|
|
The time distributed layer. |
|
|
Network training package class. |
|
|
Network training with loss scaling. |
|
|
Wraps the forward network with the loss function. |
|
|
Cell with loss function. |
|
Convolutional Layer
API Name |
Description |
Supported Platforms |
1D convolution layer. |
|
|
Calculates a 1D transposed convolution, which can be regarded as Conv1d for the gradient of the input, also called deconvolution (although it is not an actual deconvolution). |
|
|
2D convolution layer. |
|
|
Calculates a 2D transposed convolution, which can be regarded as Conv2d for the gradient of the input, also called deconvolution (although it is not an actual deconvolution). |
|
|
3D convolution layer. |
|
|
Calculates a 3D transposed convolution, which can be regarded as Conv3d for the gradient of the input. |
|
|
Extracts patches from images. |
|
Recurrent Layer
API Name |
Description |
Supported Platforms |
Stacked Elman RNN layers, applying RNN layer with \(\tanh\) or \(\text{ReLU}\) non-linearity to the input. |
|
|
An Elman RNN cell with tanh or ReLU non-linearity. |
|
|
Stacked GRU (Gated Recurrent Unit) layers. |
|
|
A GRU(Gated Recurrent Unit) cell. |
|
|
Stacked LSTM (Long Short-Term Memory) layers. |
|
|
A LSTM (Long Short-Term Memory) cell. |
|
Transformer Layer
API Name |
Description |
Supported Platforms |
This is an implementation of multihead attention in the paper Attention is all you need. |
|
|
Transformer Encoder Layer. |
|
|
Transformer Decoder Layer. |
|
|
Transformer Encoder module with multi-layer stacked of |
|
|
Transformer Decoder module with multi-layer stacked of |
|
|
Transformer module including encoder and decoder. |
|
Embedding Layer
API Name |
Description |
Supported Platforms |
A simple lookup table that stores embeddings of a fixed dictionary and size. |
|
|
EmbeddingLookup layer. |
|
|
Returns a slice of input tensor based on the specified indices and the field ids. |
|
Nonlinear Activation Layer
API Name |
Description |
Supported Platforms |
CELU Activation Operator. |
|
|
Applies the exponential linear unit function element-wise. |
|
|
Applies FastGelu function to each element of the input. |
|
|
Applies GELU function to each element of the input. |
|
|
The gated linear unit function. |
|
|
Gets the activation function. |
|
|
Applies the Hardtanh function element-wise. |
|
|
Applies Hard Shrink activation function element-wise. |
|
|
Applies Hard Sigmoid activation function element-wise. |
|
|
Applies Hard Swish activation function element-wise. |
|
|
Leaky ReLU activation function. |
|
|
Applies logsigmoid activation element-wise. |
|
|
Applies the LogSoftmax function to n-dimensional input tensor element-wise. |
|
|
Local Response Normalization. |
|
|
Computes MISH (A Self Regularized Non-Monotonic Neural Activation Function) of input tensors element-wise. |
|
|
Applies softsign activation function element-wise. |
|
|
Applies PReLU activation function element-wise. |
|
|
Applies ReLU (Rectified Linear Unit activation function) element-wise. |
|
|
Compute ReLU6 activation function element-wise. |
|
|
Applies RReLU (Randomized Leaky ReLU activation function) element-wise. |
|
|
Applies activation function SeLU (Scaled exponential Linear Unit) element-wise. |
|
|
Applies the silu linear unit function element-wise. |
|
|
Applies sigmoid activation function element-wise. |
|
|
Softmin activation function, which is a two-category function |
|
|
Softmax activation function, which is a two-category function |
|
|
Softmax function applied to 2D features data. |
|
|
Applies the SoftShrink function element-wise. |
|
|
Applies the Tanh function element-wise, returns a new tensor with the hyperbolic tangent of the elements of input, The input is a Tensor with any valid shape. |
|
|
Applies Tanhshrink activation function element-wise and returns a new tensor. |
|
|
Thresholds each element of the input Tensor. |
|
Linear Layer
API Name |
Description |
Supported Platforms |
The dense connected layer. |
|
|
The bilinear dense connected layer. |
|
Dropout Layer
API Name |
Description |
Supported Platforms |
Dropout layer for the input. |
|
|
During training, randomly zeroes entire channels of the input tensor with probability p from a Bernoulli distribution (For a 3-dimensional tensor with a shape of \((N, C, L)\), the channel feature map refers to a 1-dimensional feature map with the shape of \(L\)). |
|
|
During training, randomly zeroes some channels of the input tensor with probability p from a Bernoulli distribution (For a 4-dimensional tensor with a shape of \(NCHW\), the channel feature map refers to a 2-dimensional feature map with the shape of \(HW\)). |
|
|
During training, randomly zeroes some channels of the input tensor with probability p from a Bernoulli distribution (For a 5-dimensional tensor with a shape of \(NCDHW\), the channel feature map refers to a 3-dimensional feature map with a shape of \(DHW\)). |
|
Normalization Layer
API Name |
Description |
Supported Platforms |
This layer applies Batch Normalization over a 2D or 3D input (a mini-batch of 1D or 2D inputs) to reduce internal covariate shift. |
|
|
Batch Normalization is widely used in convolutional networks. |
|
|
Batch Normalization is widely used in convolutional networks. |
|
|
Group Normalization over a mini-batch of inputs. |
|
|
This layer applies Instance Normalization over a 3D input (a mini-batch of 1D inputs with additional channel dimension). |
|
|
This layer applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension). |
|
|
This layer applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension). |
|
|
Applies Layer Normalization over a mini-batch of inputs. |
|
|
Sync Batch Normalization layer over a N-dimension input. |
|
Pooling Layer
API Name |
Description |
Supported Platforms |
Applies a 1D adaptive average pooling over an input Tensor which can be regarded as a composition of 1D input planes. |
|
|
This operator applies a 2D adaptive average pooling to an input signal composed of multiple input planes. |
|
|
This operator applies a 3D adaptive average pooling to an input signal composed of multiple input planes. |
|
|
Applies a 1D adaptive maximum pooling over an input Tensor which can be regarded as a composition of 1D input planes. |
|
|
This operator applies a 2D adaptive max pooling to an input signal composed of multiple input planes. |
|
|
Calculates the 3D adaptive max pooling for an input Tensor. |
|
|
Applies a 1D average pooling over an input Tensor which can be regarded as a composition of 1D input planes. |
|
|
Applies a 2D average pooling over an input Tensor which can be regarded as a composition of 2D input planes. |
|
|
Applies a 3D average pooling over an input Tensor which can be regarded as a composition of 3D input planes. |
|
|
Applies the 3D FractionalMaxPool operation over input. |
|
|
Applying 1D LPPooling operation on an input Tensor can be regarded as forming a 1D input plane. |
|
|
Applying 2D LPPooling operation on an input Tensor can be regarded as forming a 1D input plane. |
|
|
Applies a 1D max pooling over an input Tensor which can be regarded as a composition of 1D planes. |
|
|
Applies a 2D max pooling over an input Tensor which can be regarded as a composition of 2D planes. |
|
|
3D max pooling operation. |
|
|
Computes the inverse of |
|
|
Computes the inverse of |
|
|
Computes the inverse of |
|
Padding Layer
API Name |
Description |
Supported Platforms |
Pads the input tensor according to the paddings and mode. |
|
|
Using a given constant value to pads the last dimensions of input tensor. |
|
|
Using a given constant value to pads the last two dimensions of input tensor. |
|
|
Using a given constant value to pads the last three dimensions of input tensor. |
|
|
Using a given padding to do reflection pad on the given tensor. |
|
|
Using a given padding to do reflection pad the given tensor. |
|
|
Pad the given tensor in a reflecting way using the input boundaries as the axis of symmetry. |
|
|
Pad on W dimension of input x according to padding. |
|
|
Pad on HW dimension of input x according to padding. |
|
|
Pad on DHW dimension of input x according to padding. |
|
|
Pads the last two dimensions of input tensor with zero. |
|
Loss Function
API Name |
Description |
Supported Platforms |
BCELoss creates a criterion to measure the binary cross entropy between the true labels and predicted labels. |
|
|
Adds sigmoid activation function to input input as logits, and uses the given logits to compute binary cross entropy between the input and the target. |
|
|
CosineEmbeddingLoss creates a criterion to measure the similarity between two tensors using cosine distance. |
|
|
The cross entropy loss between input and target. |
|
|
Calculates the CTC (Connectionist Temporal Classification) loss. |
|
|
The Dice coefficient is a set similarity loss, which is used to calculate the similarity between two samples. |
|
|
It is a loss function to solve the imbalance of categories and the difference of classification difficulty. |
|
|
Gaussian negative log likelihood loss. |
|
|
Calculate the Hinge Embedding Loss value based on the input 'logits' and' labels' (only including 1 or -1). |
|
|
HuberLoss calculate the error between the predicted value and the target value. |
|
|
Computes the Kullback-Leibler divergence between the logits and the labels. |
|
|
L1Loss is used to calculate the mean absolute error between the predicted value and the target value. |
|
|
MarginRankingLoss creates a criterion that measures the loss. |
|
|
MAELoss creates a criterion to measure the average absolute error between \(x\) and \(y\) element-wise, where \(x\) is the input and \(y\) is the labels. |
|
|
Calculates the mean squared error between the predicted value and the label value. |
|
|
When there are multiple classifications, label is transformed into multiple binary classifications by one hot. |
|
|
Creates a loss criterion that minimizes the hinge loss for multi-class classification tasks. |
|
|
Calculates the MultiLabelSoftMarginLoss. |
|
|
Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input \(x\) (a 2D mini-batch Tensor) and output \(y\) (which is a 1D tensor of target class indices, \(0 \leq y \leq \text{x.size}(1)-1\)): |
|
|
Gets the negative log likelihood loss between logits and labels. |
|
|
Poisson negative log likelihood loss. |
|
|
RMSELoss creates a criterion to measure the root mean square error between \(x\) and \(y\) element-wise, where \(x\) is the input and \(y\) is the labels. |
|
|
Computes the sampled softmax training loss. |
|
|
SmoothL1 loss function, if the absolute error element-wise between the predicted value and the target value is less than the set threshold beta, the square term is used, otherwise the absolute error term is used. |
|
|
A loss class for two-class classification problems. |
|
|
Computes softmax cross entropy between logits and labels. |
|
|
TripletMarginLoss operation. |
|
Optimizer
API Name |
Description |
Supported Platforms |
Implements the Adadelta algorithm. |
|
|
Implements the Adagrad algorithm. |
|
|
Implements the Adaptive Moment Estimation (Adam) algorithm. |
|
|
Implements the AdaMax algorithm, a variant of Adaptive Movement Estimation (Adam) based on the infinity norm. |
|
|
This optimizer will offload Adam optimizer to host CPU and keep parameters being updated on the device, to minimize the memory cost. |
|
|
Implements the Adam algorithm with weight decay. |
|
|
Enable the adasum in "auto_parallel/semi_auto_parallel" mode. |
|
|
Enable the adasum in "auto_parallel/semi_auto_parallel" mode. |
|
|
Implements Average Stochastic Gradient Descent. |
|
|
Implements the FTRL algorithm. |
|
|
Implements the Lamb(Layer-wise Adaptive Moments optimizer for Batching training) algorithm. |
|
|
Implements the LARS algorithm. |
|
|
Implements the Adaptive Moment Estimation (Adam) algorithm. |
|
|
Implements the Momentum algorithm. |
|
|
Implements TFT optimizer wrapper, this wrapper is used to report status to MindIO TFT before optimizer updating. |
|
|
Implements the ProximalAdagrad algorithm that is an online Learning and Stochastic Optimization. |
|
|
Implements Root Mean Squared Propagation (RMSProp) algorithm. |
|
|
Implements Resilient backpropagation. |
|
|
Implements stochastic gradient descent. |
|
|
Updates gradients by second-order algorithm--THOR. |
|
Dynamic Learning Rate
LearningRateSchedule Class
The dynamic learning rates in this module are all subclasses of LearningRateSchedule. Pass the instance of LearningRateSchedule to an optimizer. During the training process, the optimizer calls the instance taking current step as input to get the current learning rate.
import mindspore.nn as nn
min_lr = 0.01
max_lr = 0.1
decay_steps = 4
cosine_decay_lr = nn.CosineDecayLR(min_lr, max_lr, decay_steps)
net = Net()
optim = nn.Momentum(net.trainable_params(), learning_rate=cosine_decay_lr, momentum=0.9)
API Name |
Description |
Supported Platforms |
Calculates learning rate based on cosine decay function. |
|
|
Calculates learning rate based on exponential decay function. |
|
|
Calculates learning rate base on inverse-time decay function. |
|
|
Calculates learning rate base on natural exponential decay function. |
|
|
Calculates learning rate base on polynomial decay function. |
|
|
Gets learning rate warming up. |
|
Dynamic LR Function
The dynamic learning rates in this module are all functions. Call the function and pass the result to an optimizer. During the training process, the optimizer takes result[current step] as current learning rate.
import mindspore.nn as nn
min_lr = 0.01
max_lr = 0.1
total_step = 6
step_per_epoch = 1
decay_epoch = 4
lr= nn.cosine_decay_lr(min_lr, max_lr, total_step, step_per_epoch, decay_epoch)
net = Net()
optim = nn.Momentum(net.trainable_params(), learning_rate=lr, momentum=0.9)
API Name |
Description |
Supported Platforms |
Calculates learning rate base on cosine decay function. |
|
|
Calculates learning rate base on exponential decay function. |
|
|
Calculates learning rate base on inverse-time decay function. |
|
|
Calculates learning rate base on natural exponential decay function. |
|
|
Get piecewise constant learning rate. |
|
|
Calculates learning rate base on polynomial decay function. |
|
|
Gets learning rate warming up. |
|
Image Processing Layer
API Name |
Description |
Supported Platforms |
Applies the PixelShuffle operation over input which implements sub-pixel convolutions with stride \(1/r\) . |
|
|
Applies the PixelUnshuffle operation over input which is the inverse of PixelShuffle. |
|
|
For details, please refer to |
|
Tools
API Name |
Description |
Supported Platforms |
Divide the channels of Tensor whose shape is \((*, C, H, W)\) into \(g\) groups to obtain a Tensor with shape \((*, C \frac g, g, H, W)\), and transpose along the corresponding axis of \(C\), \(\frac{g}{}\) and \(g\) to restore Tensor to the original shape. |
|
|
Flatten the input Tensor along dimensions from start_dim to end_dim. |
|
|
A placeholder identity operator that returns the same as input. |
|
|
Unflattens a Tensor dim according to axis and unflattened_size. |
|
|
In scenarios where a checkpoint is loaded, parameters within the network instantiation will be instantiated and occupy physical memory. |
|