mindspore.communication.comm_func.all_to_all_single_with_output_shape

mindspore.communication.comm_func.all_to_all_single_with_output_shape(output_shape, tensor, output_split_sizes=None, input_split_sizes=None, group=None, async_op=False)[source]

Based on the slice size of the user input, the input tensor is sliced and sent to other devices and receives the sliced chunks from the other devices, which are then merged into an output Tensor.

Note

'output_shape' and 'tensor' shape should be match across ranks. Only support PyNative mode, Graph mode is not currently supported.

Parameters

output_shape (Union(Tensor, Tuple(int))) – shape to indicate the shape of tensor gathered concatenated from remote rank.
tensor (Tensor) – tensor to be scattered to remote rank.
output_split_sizes (Union(Tuple(int), List(int))) – output split size at dim 0. If set to None, it means equally split by world_size. Default: None.
input_split_sizes (Union(Tuple(int), List(int))) – input split size at dim 0. If set to None, it means equally split by world_size. Default: None.
group (str, optional) – The communication group to work on. Default: None, which means "hccl_world_group" on Ascend, "nccl_world_group" on GPU.
async_op (bool, optional) – Whether this operator should be an async operator. Default: False .

Returns

Tuple(Tensor, CommHandle), the output tensor is gathered concatenated from remote ranks. If the numel of tensor gathered from remote is zero, it will return a Tensor with shape (), and value has no actual meanning. CommHandle is an async work handle, if async_op is set to True. CommHandle will be None, when async_op is False.

Raises

TypeError – If tensor is not tensor.
TypeError – If output_shape is not tuple or tensors.

Supported Platforms:: Ascend

Examples

Note

Before running the following examples, you need to configure the communication environment variables.

For Ascend/GPU/CPU devices, it is recommended to use the msrun startup method without any third-party or configuration file dependencies. Please see the msrun start up for more details.

This example should be run with 2 devices.

>>> import numpy as np
>>> import mindspore as ms
>>> import mindspore.communication as comm
>>>
>>> comm.init()
>>> rank = comm.get_rank()
>>> input = ms.Tensor([0, 1]) + rank * 2
>>> output_shape = (2,)
>>> result, _ = comm.comm_func.all_to_all_single_with_output_shape(output_shape, input)
>>> print(result)
rank 0:
[ 0.  2.]
rank 1:
[ 1.  3.]