mindspore.mint.distributed.reduce

mindspore.mint.distributed.reduce(tensor, dst, op=ReduceOp.SUM, group=None, async_op=False)[source]

Reduces tensors across the processes in the specified communication group, sends the result to the target dst(global rank), and returns the tensor which is sent to the target process.

Note

Only process with destination rank receives the reduced output.
Only support PyNative mode, Graph mode is not currently supported.
Other processes only get a tensor with shape [1], which has no mathematical meaning.

Parameters

tensor (Tensor) – Input and output of the collective. The function operates in-place.
dst (int) – The target rank of the process(global rank) that receives the reduced output.
op (str, optional) – Specifies an operation used for element-wise reductions, like sum, prod, max, and min. Default: ReduceOp.SUM .
group (str, optional) – The communication group to work on. If None, which means "hccl_world_group" in Ascend. Default: None.
async_op (bool, optional) – Whether this operator should be an async operator. Default: False .

Returns

CommHandle, CommHandle is an async work handle, if async_op is set to True. CommHandle will be None, when async_op is False.

Raises

TypeError – If the type of tensor is not Tensor, any of op and group is not a str. async_op is not bool or 'op' is invalid.
RuntimeError – If device target is invalid, or backend is invalid, or distributed initialization fails.

Supported Platforms:: Ascend

Examples

Note

Before running the following examples, you need to configure the communication environment variables.

For Ascend devices, it is recommended to use the msrun startup method without any third-party or configuration file dependencies.

Please see the msrun start up for more details.

This example should be run with 4 devices.

>>> from mindspore import mint
>>> import mindspore.nn as nn
>>> from mindspore.mint.distributed import init_process_group, reduce
>>> from mindspore import Tensor
>>> import numpy as np
>>> # Launch 2 processes.
>>> init_process_group()
>>> dest_rank=1
>>> input_tensor = Tensor(np.ones([2, 8]).astype(np.float32))
>>> output = reduce(input_tensor, dest_rank)
>>> print(input_tensor)
Process with rank 0: [[1. 1. 1. 1. 1. 1. 1. 1.]
                     [1. 1. 1. 1. 1. 1. 1. 1.]],
Process with rank 1: [[2. 2. 2. 2. 2. 2. 2. 2.]
                     [2. 2. 2. 2. 2. 2. 2. 2.]],