mindspore.ops.AllReduce

class mindspore.ops.AllReduce(op=ReduceOp.SUM, group=GlobalComm.WORLD_COMM_GROUP)[source]

Reduces the tensor data across all devices in such a way that all devices will get the same final result.

Note

The tensors must have the same shape and format in all processes of the collection.

Parameters

op (str) – Specifies an operation used for element-wise reductions, like sum, prod, max, and min. On the CPU, only ‘sum’ is supported. Default: ReduceOp.SUM .
group (str) – The communication group to work on. Default: GlobalComm.WORLD_COMM_GROUP , which means "hccl_world_group" in Ascend, and "nccl_world_group" in GPU.

Inputs:

input_x (Tensor) - The shape of tensor is \((x_1, x_2, ..., x_R)\).

Outputs:

Tensor, has the same shape of the input, i.e., \((x_1, x_2, ..., x_R)\). The contents depend on the specified operation.

Raises: TypeError – If any of op and group is not a str, or fusion is not an integer, or the input’s dtype is bool.

Supported Platforms:: Ascend GPU CPU

Examples

Note

Before running the following examples, you need to configure the communication environment variables.

For the Ascend devices, users need to prepare the rank table, set rank_id and device_id. Please see the rank table Startup for more details.

For the GPU devices, users need to prepare the host file and mpi, please see the mpirun Startup .

For the CPU device, users need to write a dynamic cluster startup script, please see the Dynamic Cluster Startup .

This example should be run with 2 devices.

>>> import numpy as np
>>> from mindspore.communication import init
>>> from mindspore import Tensor
>>> from mindspore.ops import ReduceOp
>>> import mindspore.nn as nn
>>> import mindspore.ops as ops
>>>
>>> init()
>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.allreduce_sum = ops.AllReduce(ReduceOp.SUM)
...
...     def construct(self, x):
...         return self.allreduce_sum(x)
...
>>> input_ = Tensor(np.ones([2, 8]).astype(np.float32))
>>> net = Net()
>>> output = net(input_)
>>> print(output)
[[2. 2. 2. 2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2. 2. 2. 2.]]

Tutorial Examples:

Distributed Set Communication Primitives - AllReduce