mindspore.nn.DistributedGradReducer

class mindspore.nn.DistributedGradReducer(parameters, mean=True, degree=None, fusion_type=1, group=GlobalComm.WORLD_COMM_GROUP)[source]

A distributed optimizer.

Constructs a gradient reducer Cell, which applies communication and average operations on single-process gradient values. Used in data parallel.

Parameters
  • parameters (list) – the parameters to be updated.

  • mean (bool) – When mean is true, the mean coefficient (degree) would apply on gradients. Default: True.

  • degree (int) – The mean coefficient. Usually it equals to device number. Default: None.

  • fusion_type (int) – The type of all reduce fusion. Default: 1.

  • group (str) – The communication group to work on. Normally, the group should be created by create_group, otherwise, using the default group. Default: WORLD_COMM_GROUP.

Raises

ValueError – If degree is not an int or less than 0.

Supported Platforms:

Ascend GPU

Examples

>>> # This example should be run with multiple processes.
>>> # Please refer to the Programming Guide > Distributed Training -> Distributed Parallel Usage Example
>>> # on mindspore.cn and focus on the contents of these three parts: Configuring Distributed Environment
>>> # Variables, Calling the Collective Communication Library, Running The Script.
>>> import numpy as np
>>> import mindspore as ms
>>> from mindspore.communication import init
>>> from mindspore import ops
>>> from mindspore import Parameter, Tensor
>>> from mindspore import nn
>>>
>>> ms.set_context(mode=ms.GRAPH_MODE)
>>> init()
>>> ms.reset_auto_parallel_context()
>>> ms.set_auto_parallel_context(parallel_mode=ms.ParallelMode.DATA_PARALLEL)
>>>
>>> class TrainingWrapper(nn.Cell):
...     def __init__(self, network, optimizer, sens=1.0):
...         super(TrainingWrapper, self).__init__(auto_prefix=False)
...         self.network = network
...         self.network.add_flags(defer_inline=True)
...         self.weights = optimizer.parameters
...         self.optimizer = optimizer
...         self.grad = ops.GradOperation(get_by_list=True, sens_param=True)
...         self.sens = sens
...         self.reducer_flag = False
...         self.grad_reducer = None
...         self.parallel_mode = context.get_auto_parallel_context("parallel_mode")
...         self.depend = ops.Depend()
...         if self.parallel_mode in [ms.ParallelMode.DATA_PARALLEL, ms.ParallelMode.HYBRID_PARALLEL]:
...             self.reducer_flag = True
...         if self.reducer_flag:
...             mean = context.get_auto_parallel_context("gradients_mean")
...             degree = context.get_auto_parallel_context("device_num")
...             self.grad_reducer = nn.DistributedGradReducer(optimizer.parameters, mean, degree)
...
...     def construct(self, *args):
...         weights = self.weights
...         loss = self.network(*args)
...         sens = ops.Fill()(ops.DType()(loss), ops.Shape()(loss), self.sens)
...         grads = self.grad(self.network, weights)(*args, sens)
...         if self.reducer_flag:
...             # apply grad reducer on grads
...             grads = self.grad_reducer(grads)
...         return self.depend(loss, self.optimizer(grads))
>>>
>>> class Net(nn.Cell):
...     def __init__(self, in_features, out_features):
...         super(Net, self).__init__()
...         self.weight = Parameter(Tensor(np.ones([in_features, out_features]).astype(np.float32)),
...                                 name='weight')
...         self.matmul = ops.MatMul()
...
...     def construct(self, x):
...         output = self.matmul(x, self.weight)
...         return output
>>>
>>> size, in_features, out_features = 16, 16, 10
>>> network = Net(in_features, out_features)
>>> loss = nn.MSELoss()
>>> net_with_loss = nn.WithLossCell(network, loss)
>>> optimizer = nn.Momentum(net_with_loss.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> train_cell = TrainingWrapper(net_with_loss, optimizer)
>>> inputs = Tensor(np.ones([size, in_features]).astype(np.float32))
>>> label = Tensor(np.zeros([size, out_features]).astype(np.float32))
>>> grads = train_cell(inputs, label)
>>> print(grads)
256.0