mindspore.nn
Neural Network Cell
For building predefined building blocks or computational units in neural networks.
For more information about dynamic shape support status, please refer to Dynamic Shape Support Status of nn Interface .
Compared with the previous version, the added, deleted and supported platforms change information of mindspore.nn operators in MindSpore, please refer to the link mindspore.nn API Interface Change .
Basic Block
API Name |
Description |
Supported Platforms |
The basic building block of neural networks in MindSpore. |
|
|
Base class for running the graph loaded from a MindIR. |
|
|
Base class for other losses. |
|
|
Base class for updating parameters. |
|
Container
API Name |
Description |
Supported Platforms |
Holds Cells in a list. |
|
|
Sequential Cell container. |
|
Wrapper Layer
API Name |
Description |
Supported Platforms |
A distributed optimizer. |
|
|
Dynamic Loss scale update cell. |
|
|
Update cell with fixed loss scaling value. |
|
|
Encapsulate training network. |
|
|
Cell to run for getting the next operation. |
|
|
This function splits the input at the 0th into interleave_num pieces and then performs the computation of the wrapped cell. |
|
|
Cell that updates parameter. |
|
|
Wrap the network with Micro Batch. |
|
|
The time distributed layer. |
|
|
Network training package class. |
|
|
Network training with loss scaling. |
|
|
Wraps the forward network with the loss function. |
|
|
Cell with loss function. |
|
Convolutional Layer
API Name |
Description |
Supported Platforms |
Calculates the 1D convolution on the input tensor. |
|
|
Calculates a 1D transposed convolution, which can be regarded as Conv1d for the gradient of the input, also called deconvolution (although it is not an actual deconvolution). |
|
|
Calculates the 2D convolution on the input tensor. |
|
|
Calculates a 2D transposed convolution, which can be regarded as Conv2d for the gradient of the input, also called deconvolution (although it is not an actual deconvolution). |
|
|
Calculates the 3D convolution on the input tensor. |
|
|
Calculates a 3D transposed convolution, which can be regarded as Conv3d for the gradient of the input. |
|
|
Extracts patches from images. |
|
Recurrent Layer
API Name |
Description |
Supported Platforms |
Stacked Elman RNN layers. |
|
|
An Elman RNN cell with tanh or ReLU non-linearity. |
|
|
Stacked GRU (Gated Recurrent Unit) layers. |
|
|
A GRU(Gated Recurrent Unit) cell. |
|
|
Stacked LSTM (Long Short-Term Memory) layers. |
|
|
A LSTM (Long Short-Term Memory) cell. |
|
Transformer Layer
API Name |
Description |
Supported Platforms |
This is an implementation of multihead attention in the paper Attention is all you need. |
|
|
Transformer Encoder Layer. |
|
|
Transformer Decoder Layer. |
|
|
Transformer Encoder module with multi-layer stacked of TransformerEncoderLayer, including multihead self attention and feedforward layer. |
|
|
Transformer Decoder module with multi-layer stacked of TransformerDecoderLayer, including multihead self attention, cross attention and feedforward layer. |
|
|
Transformer module including encoder and decoder. |
|
Embedding Layer
API Name |
Description |
Supported Platforms |
A simple lookup table that stores embeddings of a fixed dictionary and size. |
|
|
EmbeddingLookup layer. |
|
|
Returns a slice of input tensor based on the specified indices and the field ids. |
|
Nonlinear Activation Layer
API Name |
Description |
Supported Platforms |
Continuously differentiable exponential linear units activation function. |
|
|
Exponential Linear Unit activation function. |
|
|
Fast Gaussian error linear unit activation function. |
|
|
Gaussian error linear unit activation function. |
|
|
The gated linear unit function. |
|
|
Gets the activation function. |
|
|
Applies the Hardtanh function element-wise. |
|
|
Hard Shrink activation function. |
|
|
Hard sigmoid activation function. |
|
|
Applies hswish-type activation element-wise. |
|
|
Leaky ReLU activation function. |
|
|
Applies logsigmoid activation element-wise. |
|
|
Applies the LogSoftmax function to n-dimensional input tensor. |
|
|
Local Response Normalization. |
|
|
Computes MISH(A Self Regularized Non-Monotonic Neural Activation Function) of input tensors element-wise. |
|
|
Softsign activation function. |
|
|
PReLU activation function. |
|
|
Rectified Linear Unit activation function. |
|
|
Compute ReLU6 activation function. |
|
|
Randomized Leaky ReLU activation function. |
|
|
Activation function SeLU (Scaled exponential Linear Unit). |
|
|
Sigmoid Linear Unit activation function. |
|
|
Sigmoid activation function. |
|
|
Softmin activation function, which is a two-category function |
|
|
Softmax activation function, which is a two-category function |
|
|
Softmax function applied to 2D features data. |
|
|
Applies the SoftShrink function element-wise. |
|
|
Applies the Tanh function element-wise, returns a new tensor with the hyperbolic tangent of the elements of input, The input is a Tensor with any valid shape. |
|
|
Tanhshrink activation function. |
|
|
Thresholds each element of the input Tensor. |
|
Linear Layer
API Name |
Description |
Supported Platforms |
The dense connected layer. |
|
|
The bilinear dense connected layer. |
|
Dropout Layer
API Name |
Description |
Supported Platforms |
Dropout layer for the input. |
|
|
During training, randomly zeroes entire channels of the input tensor with probability p from a Bernoulli distribution (For a 3-dimensional tensor with a shape of \((N, C, L)\), the channel feature map refers to a 1-dimensional feature map with the shape of \(L\)). |
|
|
During training, randomly zeroes some channels of the input tensor with probability p from a Bernoulli distribution (For a 4-dimensional tensor with a shape of \(NCHW\), the channel feature map refers to a 2-dimensional feature map with the shape of \(HW\)). |
|
|
During training, randomly zeroes some channels of the input tensor with probability p from a Bernoulli distribution (For a 5-dimensional tensor with a shape of \(NCDHW\), the channel feature map refers to a 3-dimensional feature map with a shape of \(DHW\)). |
|
Normalization Layer
API Name |
Description |
Supported Platforms |
This layer applies Batch Normalization over a 2D or 3D input (a mini-batch of 1D or 2D inputs) to reduce internal covariate shift. |
|
|
Batch Normalization is widely used in convolutional networks. |
|
|
Batch Normalization is widely used in convolutional networks. |
|
|
Group Normalization over a mini-batch of inputs. |
|
|
This layer applies Instance Normalization over a 3D input (a mini-batch of 1D inputs with additional channel dimension). |
|
|
This layer applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension). |
|
|
This layer applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension). |
|
|
Applies Layer Normalization over a mini-batch of inputs. |
|
|
Sync Batch Normalization layer over a N-dimension input. |
|
Pooling Layer
API Name |
Description |
Supported Platforms |
Applies a 1D adaptive average pooling over an input Tensor which can be regarded as a composition of 1D input planes. |
|
|
This operator applies a 2D adaptive average pooling to an input signal composed of multiple input planes. |
|
|
This operator applies a 3D adaptive average pooling to an input signal composed of multiple input planes. |
|
|
Applies a 1D adaptive maximum pooling over an input Tensor which can be regarded as a composition of 1D input planes. |
|
|
This operator applies a 2D adaptive max pooling to an input signal composed of multiple input planes. |
|
|
Calculates the 3D adaptive max pooling for an input Tensor. |
|
|
Applies a 1D average pooling over an input Tensor which can be regarded as a composition of 1D input planes. |
|
|
Applies a 2D average pooling over an input Tensor which can be regarded as a composition of 2D input planes. |
|
|
Applies a 3D average pooling over an input Tensor which can be regarded as a composition of 3D input planes. |
|
|
Applies the 3D FractionalMaxPool operatin over input. |
|
|
Applying 1D LPPooling operation on an input Tensor can be regarded as forming a 1D input plane. |
|
|
Applying 2D LPPooling operation on an input Tensor can be regarded as forming a 1D input plane. |
|
|
Applies a 1D max pooling over an input Tensor which can be regarded as a composition of 1D planes. |
|
|
Applies a 2D max pooling over an input Tensor which can be regarded as a composition of 2D planes. |
|
|
3D max pooling operation. |
|
|
Computes the inverse of |
|
|
Computes the inverse of |
|
|
Computes the inverse of |
|
Padding Layer
API Name |
Description |
Supported Platforms |
Pads the input tensor according to the paddings and mode. |
|
|
Using a given constant value to pads the last dimensions of input tensor. |
|
|
Using a given constant value to pads the last two dimensions of input tensor. |
|
|
Using a given constant value to pads the last three dimensions of input tensor. |
|
|
Using a given padding to do reflection pad on the given tensor. |
|
|
Using a given padding to do reflection pad the given tensor. |
|
|
Pad the given tensor in a reflecting way using the input boundaries as the axis of symmetry. |
|
|
Pad on W dimension of input x according to padding. |
|
|
Pad on HW dimension of input x according to padding. |
|
|
Pad on DHW dimension of input x according to padding. |
|
|
Pads the last two dimensions of input tensor with zero. |
|
Loss Function
API Name |
Description |
Supported Platforms |
BCELoss creates a criterion to measure the binary cross entropy between the true labels and predicted labels. |
|
|
Adds sigmoid activation function to input logits, and uses the given logits to compute binary cross entropy between the logits and the labels. |
|
|
CosineEmbeddingLoss creates a criterion to measure the similarity between two tensors using cosine distance. |
|
|
The cross entropy loss between input and target. |
|
|
Calculates the CTC (Connectionist Temporal Classification) loss. |
|
|
The Dice coefficient is a set similarity loss, which is used to calculate the similarity between two samples. |
|
|
It is a loss function to solve the imbalance of categories and the difference of classification difficulty. |
|
|
Gaussian negative log likelihood loss. |
|
|
Calculate the Hinge Embedding Loss value based on the input 'logits' and' labels' (only including 1 or -1). |
|
|
HuberLoss calculate the error between the predicted value and the target value. |
|
|
Computes the Kullback-Leibler divergence between the logits and the labels. |
|
|
L1Loss is used to calculate the mean absolute error between the predicted value and the target value. |
|
|
MarginRankingLoss creates a criterion that measures the loss. |
|
|
Calculates the mean squared error between the predicted value and the label value. |
|
|
When there are multiple classifications, label is transformed into multiple binary classifications by one hot. |
|
|
Creates a loss criterion that minimizes the hinge loss for multi-class classification tasks. |
|
|
Calculates the MultiLabelSoftMarginLoss. |
|
|
Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input \(x\) (a 2D mini-batch Tensor) and output \(y\) (which is a 1D tensor of target class indices, \(0 \leq y \leq \text{x.size}(1)-1\)): |
|
|
Gets the negative log likelihood loss between logits and labels. |
|
|
Poisson negative log likelihood loss. |
|
|
RMSELoss creates a criterion to measure the root mean square error between \(x\) and \(y\) element-wise, where \(x\) is the input and \(y\) is the labels. |
|
|
Computes the sampled softmax training loss. |
|
|
SmoothL1 loss function, if the absolute error element-wise between the predicted value and the target value is less than the set threshold beta, the square term is used, otherwise the absolute error term is used. |
|
|
A loss class for two-class classification problems. |
|
|
Computes softmax cross entropy between logits and labels. |
|
|
TripletMarginLoss operation. |
|
Optimizer
API Name |
Description |
Supported Platforms |
Implements the Adadelta algorithm. |
|
|
Implements the Adagrad algorithm. |
|
|
Implements the Adaptive Moment Estimation (Adam) algorithm. |
|
|
Implements the AdaMax algorithm, a variant of Adaptive Movement Estimation (Adam) based on the infinity norm. |
|
|
This optimizer will offload Adam optimizer to host CPU and keep parameters being updated on the device, to minimize the memory cost. |
|
|
Implements the Adam algorithm with weight decay. |
|
|
Enable the adasum in "auto_parallel/semi_auto_parallel" mode. |
|
|
Enable the adasum in "auto_parallel/semi_auto_parallel" mode. |
|
|
Implements Average Stochastic Gradient Descent. |
|
|
Implements the FTRL algorithm. |
|
|
Implements the Lamb(Layer-wise Adaptive Moments optimizer for Batching training) algorithm. |
|
|
Implements the LARS algorithm. |
|
|
Implements the Adaptive Moment Estimation (Adam) algorithm. |
|
|
Implements the Momentum algorithm. |
|
|
Implements the ProximalAdagrad algorithm. |
|
|
Implements Root Mean Squared Propagation (RMSProp) algorithm. |
|
|
Implements Resilient backpropagation. |
|
|
Implements stochastic gradient descent. |
|
|
Updates gradients by second-order algorithm--THOR. |
|
Experimental Optimizer
API Name |
Description |
Supported Platforms |
Base class for all optimizers. |
|
|
Implements Adam algorithm.. |
|
|
Implements Adam Weight Decay algorithm. |
|
|
Stochastic Gradient Descent optimizer. |
|
Dynamic Learning Rate
LearningRateSchedule Class
The dynamic learning rates in this module are all subclasses of LearningRateSchedule. Pass the instance of LearningRateSchedule to an optimizer. During the training process, the optimizer calls the instance taking current step as input to get the current learning rate.
import mindspore.nn as nn
min_lr = 0.01
max_lr = 0.1
decay_steps = 4
cosine_decay_lr = nn.CosineDecayLR(min_lr, max_lr, decay_steps)
net = Net()
optim = nn.Momentum(net.trainable_params(), learning_rate=cosine_decay_lr, momentum=0.9)
API Name |
Description |
Supported Platforms |
Calculates learning rate based on cosine decay function. |
|
|
Calculates learning rate based on exponential decay function. |
|
|
Calculates learning rate base on inverse-time decay function. |
|
|
Calculates learning rate base on natural exponential decay function. |
|
|
Calculates learning rate base on polynomial decay function. |
|
|
Gets learning rate warming up. |
|
Dynamic LR Function
The dynamic learning rates in this module are all functions. Call the function and pass the result to an optimizer. During the training process, the optimizer takes result[current step] as current learning rate.
import mindspore.nn as nn
min_lr = 0.01
max_lr = 0.1
total_step = 6
step_per_epoch = 1
decay_epoch = 4
lr= nn.cosine_decay_lr(min_lr, max_lr, total_step, step_per_epoch, decay_epoch)
net = Net()
optim = nn.Momentum(net.trainable_params(), learning_rate=lr, momentum=0.9)
API Name |
Description |
Supported Platforms |
Calculates learning rate base on cosine decay function. |
|
|
Calculates learning rate base on exponential decay function. |
|
|
Calculates learning rate base on inverse-time decay function. |
|
|
Calculates learning rate base on natural exponential decay function. |
|
|
Get piecewise constant learning rate. |
|
|
Calculates learning rate base on polynomial decay function. |
|
|
Gets learning rate warming up. |
|
LRScheduler Class
The dynamic learning rates in this module are all subclasses of LRScheduler, this module should be used with optimizers in mindspore.nn.optim_ex, pass the optimizer instance to a LRScheduler when used. During the training process, the LRScheduler subclass dynamically changes the learning rate by calling the step method.
import mindspore
from mindspore import nn
# Define the network structure of LeNet5. Refer to
# https://gitee.com/mindspore/docs/blob/r2.1/docs/mindspore/code/lenet.py
net = LeNet5()
loss_fn = nn.SoftmaxCrossEntropyWithLogits(sparse=True)
optimizer = nn.optim_ex.Adam(net.trainable_params(), lr=0.05)
scheduler = nn.StepLR(optimizer, step_size=2, gamma=0.1)
def forward_fn(data, label):
logits = net(data)
loss = loss_fn(logits, label)
return loss, logits
grad_fn = mindspore.value_and_grad(forward_fn, None, optimizer.parameters, has_aux=True)
def train_step(data, label):
(loss, _), grads = grad_fn(data, label)
optimizer(grads)
return loss
for epoch in range(6):
# Create the dataset taking MNIST as an example. Refer to
# https://gitee.com/mindspore/docs/blob/r2.1/docs/mindspore/code/mnist.py
for data, label in create_dataset():
train_step(data, label)
scheduler.step()
API Name |
Description |
Supported Platforms |
Decays the learning rate of each parameter group by gamma every step_size epochs. |
|
|
Decays the learning rate of each parameter group by linearly changing small multiplicative factor until the number of epoch reaches a pre-defined milestone: total_iters. |
|
|
Basic class of learning rate schedule. |
|
Image Processing Layer
API Name |
Description |
Supported Platforms |
Applies the PixelShuffle operation over input which implements sub-pixel convolutions with stride \(1/r\) . |
|
|
Applies the PixelUnshuffle operation over input which is the inverse of PixelShuffle. |
|
|
'nn.ResizeBilinear' is deprecated from version 2.0 and will be removed in a future version, use |
Deprecated |
|
For details, please refer to |
|
Tools
API Name |
Description |
Supported Platforms |
Divide the channels of Tensor whose shape is \((*, C, H, W)\) into \(g\) groups to obtain a Tensor with shape \((*, C \frac g, g, H, W)\), and transpose along the corresponding axis of \(C\), \(\frac{g}{}\) and \(g\) to restore Tensor to the original shape. |
|
|
Flatten the input Tensor along dimensions from start_dim to end_dim. |
|
|
A placeholder identity operator that returns the same as input. |
|
|
Summary: |
|