mindspore.ops.Reduce

class mindspore.ops.Reduce(dest_rank, op=ReduceOp.SUM, group=GlobalComm.WORLD_COMM_GROUP)[source]

Reduces tensors across the processes in the specified communication group, sends the result to the target dest_rank(local rank), and returns the tensor which is sent to the target process.

Note

Only process with destination rank receives the reduced output. Support PyNative mode and Graph mode, but Graph mode only supports scenes with a graph compilation level of O0. Other processes only get a tensor with shape [1], which has no mathematical meaning.

Parameters
  • dest_rank (int) – The target process(local rank) in the specific group that receives the reduced output.

  • op (str, optional) – Specifies an operation used for element-wise reductions, like sum, prod, max, and min. On the CPU, only 'sum' is supported. Default: ReduceOp.SUM .

  • group (str, optional) – The communication group to work on. Default: GlobalComm.WORLD_COMM_GROUP , which means "hccl_world_group" in Ascend, and "nccl_world_group" in GPU.

Inputs:
  • input_x (Tensor) - The shape of tensor is \((x_1, x_2, ..., x_R)\).

Outputs:

Tensor. Return the tensor in the specific rank of the process after reduction. The shape of tensor is \((x_1, x_2, ..., x_R)\).

Raises
  • TypeError – If the type of the first input parameter is not Tensor, or any of op and group is not a str.

  • RuntimeError – If device target is invalid, or backend is invalid, or distributed initialization fails.

Supported Platforms:

Ascend

Examples

Note

Before running the following examples, you need to configure the communication environment variables.

For Ascend/GPU/CPU devices, it is recommended to use the msrun startup method without any third-party or configuration file dependencies. Please see the msrun start up for more details.

This example should be run with 4 devices.

>>> from mindspore import ops
>>> import mindspore.nn as nn
>>> from mindspore.communication import init
>>> from mindspore import Tensor
>>> import numpy as np
>>> # Launch 4 processes.
>>> init()
>>> class ReduceNet(nn.Cell):
>>>     def __init__(self):
>>>         super(Net, self).__init__()
>>>         self.reduce = ops.Reduce(dest_rank=1)
>>>
>>>     def construct(self, x):
>>>         out = self.reduce(x)
>>>         return out
>>> input = Tensor(np.ones([2, 8]).astype(np.float32))
>>> net = ReduceNet()
>>> output = net(input)
>>> print(output)
Process with rank 1: [[4. 4. 4. 4. 4. 4. 4. 4.]
                     [4. 4. 4. 4. 4. 4. 4. 4.]],
Other proesses: [0.].