mindspore.nn.SyncBatchNorm

class mindspore.nn.SyncBatchNorm(num_features, eps=1e-5, momentum=0.9, affine=True, gamma_init='ones', beta_init='zeros', moving_mean_init='zeros', moving_var_init='ones', use_batch_statistics=None, process_groups=None, dtype=mstype.float32)[source]

Sync Batch Normalization layer over a N-dimension input.

Sync Batch Normalization is cross device synchronized Batch Normalization. The implementation of Batch Normalization only normalizes the data within each device. Sync Batch Normalization will normalize the input within the group. It has been described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. It rescales and recenters the feature using a mini-batch of data and the learned parameters which can be described in the following formula.

y = \frac{x - E [x]}{\sqrt{Var [x] + ϵ}} * γ + β

Note

Currently, SyncBatchNorm only supports 2D and 4D inputs. $γ$ and $β$ are trainable scale and shift.

Parameters

num_features (int) – C from an expected input of size $(N, C, H, W)$ .
eps (float) – $ϵ$ , a value added to the denominator for numerical stability. Default: 1e-5 .
momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.9 .
affine (bool) – A bool value. When set to True , $γ$ and $β$ can be learned. Default: True .
gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the $γ$ weight. The values of str refer to the function initializer including 'zeros' , 'ones' , 'xavier_uniform' , 'he_uniform' , etc. Default: 'ones' .
beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the $β$ weight. The values of str refer to the function initializer including 'zeros' , 'ones' , 'xavier_uniform' , 'he_uniform' , etc. Default: 'zeros' .
moving_mean_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving mean. The values of str refer to the function initializer including 'zeros' , 'ones' , 'xavier_uniform' , 'he_uniform' , etc. Default: 'zeros' .
moving_var_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving variance. The values of str refer to the function initializer including 'zeros' , 'ones' , 'xavier_uniform' , 'he_uniform' , etc. Default: 'ones' .
use_batch_statistics (bool) – If true , use the mean value and variance value of current batch data. If false , use the mean value and variance value of specified value. If None , training process will use the mean and variance of current batch data and track the running mean and variance, eval process will use the running mean and variance. Default: None .
process_groups (list) – A list to divide devices into different sync groups, containing N subtraction lists. Each subtraction list contains int numbers identifying rank ids which need to be synchronized in the same group. All int values must be in [0, rank_size) and different from each other. Default: None , indicating synchronization across all devices.
dtype (mindspore.dtype) – Dtype of Parameters. Default: mstype.float32 .

Inputs:

x (Tensor) - Tensor of shape $(N, C_{i n}, H_{i n}, W_{i n})$ .

Outputs:

Tensor, the normalized, scaled, offset tensor, of shape $(N, C_{o u t}, H_{o u t}, W_{o u t})$ .

Raises

TypeError – If num_features is not an int.
TypeError – If eps is not a float.
TypeError – If process_groups is not a list.
ValueError – If num_features is less than 1.
ValueError – If momentum is not in range [0, 1].
ValueError – If rank_id in process_groups is not in range [0, rank_size).

Supported Platforms:: Ascend

Examples

Note

Before running the following examples, you need to configure the communication environment variables.

For the Ascend devices, users need to prepare the rank table, set rank_id and device_id. Please see the Ascend tutorial for more details.

For the GPU devices, users need to prepare the host file and mpi, please see the mpirun Startup .

For the CPU device, users need to write a dynamic cluster startup script, please see the Dynamic Cluster Startup .

This example should be run with multiple devices.

>>> import numpy as np
>>> import mindspore as ms
>>> from mindspore.communication import init
>>>
>>> ms.set_context(mode=ms.GRAPH_MODE)
>>> init()
>>> ms.reset_auto_parallel_context()
>>> ms.set_auto_parallel_context(parallel_mode=ms.ParallelMode.DATA_PARALLEL)
>>> sync_bn_op = ms.nn.SyncBatchNorm(num_features=3, process_groups=[[0, 1], [2, 3]])
>>> x = ms.Tensor(np.ones([1, 3, 2, 2]), ms.float32)
>>> output = sync_bn_op(x)
>>> print(output)
[[[[ 0.999995 0.999995 ]
   [ 0.999995 0.999995 ]]
  [[ 0.999995 0.999995 ]
   [ 0.999995 0.999995 ]]
  [[ 0.999995 0.999995 ]
   [ 0.999995 0.999995 ]]]]