mindspore.communication.comm_func.reduce
- mindspore.communication.comm_func.reduce(tensor, dst, op=ReduceOp.SUM, group=GlobalComm.WORLD_COMM_GROUP)[source]
Reduces tensors across the processes in the specified communication group, sends the result to the target dst(global rank), and returns the tensor which is sent to the target process.
Note
Only process with destination rank receives the reduced output. Only support PyNative mode, Graph mode is not currently supported. Other processes only get a tensor with shape [1], which has no mathematical meaning.
- Parameters
tensor (Tensor) – The input tensor to be reduced. The shape of tensor is \((x_1, x_2, ..., x_R)\).
dst (int) – The target rank of the process(global rank) that receives the reduced output.
op (str, optional) – Specifies an operation used for element-wise reductions, like sum, prod, max, and min. On the CPU, only 'sum' is supported. Default:
ReduceOp.SUM
.group (str, optional) – The communication group to work on. Default:
GlobalComm.WORLD_COMM_GROUP
, which means"hccl_world_group"
in Ascend, and"nccl_world_group"
in GPU.
- Returns
Tensor. Return the tensor in the specific rank of the process after reduction. The shape of tensor is \((x_1, x_2, ..., x_R)\).
- Raises
TypeError – If the type of the first input parameter is not Tensor, or any of op and group is not a str.
RuntimeError – If device target is invalid, or backend is invalid, or distributed initialization fails.
- Supported Platforms:
Ascend
Examples
Note
Before running the following examples, you need to configure the communication environment variables.
For Ascend/GPU/CPU devices, it is recommended to use the msrun startup method without any third-party or configuration file dependencies.
Please see the msrun start up for more details.
This example should be run with 4 devices.
>>> from mindspore import ops >>> import mindspore.nn as nn >>> from mindspore.communication import init >>> from mindspore.communication.comm_func import reduce >>> from mindspore import Tensor >>> import numpy as np >>> # Launch 4 processes. >>> init() >>> dest_rank=1 >>> input_tensor = Tensor(np.ones([2, 8]).astype(np.float32)) >>> output = reduce(input_tensor) >>> print(output) Process with rank 1: [[4. 4. 4. 4. 4. 4. 4. 4.] [4. 4. 4. 4. 4. 4. 4. 4.]], Other proesses: [0.].