mindspore.communication
Collective communication interface.
Note that the APIs in the following list need to preset communication environment variables.
For the Ascend devices, users need to prepare the rank table, set rank_id and device_id. Please see the rank table Startup for more details.
For the GPU devices, users need to prepare the host file and mpi, please see the mpirun Startup .
For the CPU device, users need to write a dynamic cluster startup script, please see the Dynamic Cluster Startup .
- class mindspore.communication.GlobalComm[source]
World communication information. The GlobalComm is a global class. The members contain:
BACKEND
: The communication library used, using"hccl"
/"nccl"
/"mccl"
."hccl"
means Huawei Collective Communication Library(HCCL),"nccl"
means NVIDIA Collective Communication Library(NCCL),"mccl"
means MindSpore Collective Communication Library(MCCL).WORLD_COMM_GROUP
: Global communication domain, using"hccl_world_group"
/"nccl_world_group"
/"mccl_world_group"
.
- mindspore.communication.create_group(group, rank_ids)[source]
Create a user collective communication group.
Note
This method isn’t supported in GPU and CPU versions of MindSpore. The size of rank_ids should be larger than 1, rank_ids should not have duplicate data. This method should be used after init(). Only support global single communication group in PyNative mode if you do not start with mpirun.
- Parameters
- Raises
TypeError – If group is not a string or rank_ids is not a list.
ValueError – If rank_ids size is not larger than 1, or rank_ids has duplicate data, or backend is invalid.
RuntimeError – If HCCL is not available or MindSpore is GPU/CPU version.
- Supported Platforms:
Ascend
GPU
CPU
Examples
Note
Before running the following examples, you need to configure the communication environment variables.
For the Ascend devices, users need to prepare the rank table, set rank_id and device_id. Please see the rank table Startup for more details.
For the GPU devices, users need to prepare the host file and mpi, please see the mpirun Startup .
For the CPU device, users need to write a dynamic cluster startup script, please see the Dynamic Cluster Startup .
>>> import mindspore as ms >>> from mindspore import set_context >>> import mindspore.ops as ops >>> from mindspore.communication import init, create_group, get_rank >>> set_context(mode=ms.GRAPH_MODE, device_target="Ascend") >>> init() >>> group = "0-7" >>> rank_ids = [0,7] >>> if get_rank() in rank_ids: ... create_group(group, rank_ids) ... allreduce = ops.AllReduce(group)
- mindspore.communication.destroy_group(group)[source]
Destroy the user collective communication group.
Note
This method isn’t supported in GPU and CPU versions of MindSpore. The parameter group should not be “hccl_world_group”. This method should be used after init().
- Parameters
group (str) – The communication group to destroy, the group should be created by create_group.
- Raises
TypeError – If group is not a string.
ValueError – If group is “hccl_world_group” or backend is invalid.
RuntimeError – If HCCL is not available or MindSpore is GPU/CPU version.
- Supported Platforms:
Ascend
GPU
CPU
Examples
Note
Before running the following examples, you need to configure the communication environment variables.
For the Ascend devices, users need to prepare the rank table, set rank_id and device_id. Please see the rank table startup for more details.
For the GPU devices, users need to prepare the host file and mpi, please see the mpirun startup .
For the CPU device, users need to write a dynamic cluster startup script, please see the Dynamic Cluster Startup .
>>> import mindspore as ms >>> from mindspore import set_context >>> import mindspore.ops as ops >>> from mindspore.communication import init, create_group, destroy_group, get_rank >>> set_context(mode=ms.GRAPH_MODE, device_target="Ascend") >>> init() >>> group = "0-2" >>> rank_ids = [0,2] >>> if get_rank() in rank_ids: ... create_group(group, rank_ids) ... destroy_group(group)
- mindspore.communication.get_group_rank_from_world_rank(world_rank_id, group)[source]
Get the rank ID in the specified user communication group corresponding to the rank ID in the world communication group.
Note
This method isn’t supported in GPU and CPU versions of MindSpore. The parameter group should not be “hccl_world_group”. This method should be used after init().
- Parameters
- Returns
int, the rank ID in the user communication group.
- Raises
TypeError – If world_rank_id is not an integer or the group is not a string.
ValueError – If group is ‘hccl_world_group’ or backend is invalid.
RuntimeError – If HCCL is not available or MindSpore is GPU/CPU version.
- Supported Platforms:
Ascend
GPU
CPU
Examples
Note
Before running the following examples, you need to configure the communication environment variables.
For the Ascend devices, users need to prepare the rank table, set rank_id and device_id. Please see the rank table Startup for more details.
For the GPU devices, users need to prepare the host file and mpi, please see the mpirun Startup
For the CPU device, users need to write a dynamic cluster startup script, please see the Dynamic Cluster Startup .
>>> import mindspore as ms >>> from mindspore import set_context >>> from mindspore.communication import init, create_group, get_group_rank_from_world_rank, get_rank >>> set_context(mode=ms.GRAPH_MODE, device_target="Ascend") >>> init() >>> group = "0-4" >>> rank_ids = [0,4] >>> if get_rank() in rank_ids: ... create_group(group, rank_ids) ... group_rank_id = get_group_rank_from_world_rank(4, group) ... print("group_rank_id is: ", group_rank_id) group_rank_id is: 1
- mindspore.communication.get_group_size(group=GlobalComm.WORLD_COMM_GROUP)[source]
Get the rank size of the specified collective communication group.
Note
This method should be used after init().
- Parameters
group (str) – The communication group to work on. Normally, the group should be created by create_group, otherwise, using the default group. Default:
GlobalComm.WORLD_COMM_GROUP
.- Returns
int, the rank size of the group.
- Raises
TypeError – If group is not a string.
ValueError – If backend is invalid.
RuntimeError – If HCCL/NCCL/MCCL is not available.
- Supported Platforms:
Ascend
GPU
CPU
Examples
Note
Before running the following examples, you need to configure the communication environment variables.
For the Ascend devices, users need to prepare the rank table, set rank_id and device_id. Please see the rank table Startup for more details.
For the GPU devices, users need to prepare the host file and mpi, please see the mpirun Startup .
For the CPU device, users need to write a dynamic cluster startup script, please see the Dynamic Cluster Startup .
>>> import mindspore as ms >>> from mindspore.communication import init, get_group_size >>> ms.set_auto_parallel_context(device_num=8) >>> init() >>> group_size = get_group_size() >>> print("group_size is: ", group_size) group_size is: 8
- mindspore.communication.get_local_rank(group=GlobalComm.WORLD_COMM_GROUP)[source]
Gets local rank ID for current device in specified collective communication group.
Note
This method isn’t supported in GPU and CPU versions of MindSpore. This method should be used after init().
- Parameters
group (str) – The communication group to work on. Normally, the group should be created by create_group, otherwise, using the default group. Default:
GlobalComm.WORLD_COMM_GROUP
.- Returns
int, the local rank ID of the calling process within the group.
- Raises
TypeError – If group is not a string.
ValueError – If backend is invalid.
RuntimeError – If HCCL is not available or MindSpore is GPU/CPU version.
- Supported Platforms:
Ascend
GPU
CPU
Examples
Note
Before running the following examples, you need to configure the communication environment variables.
For the Ascend devices, users need to prepare the rank table, set rank_id and device_id. Please see the rank table Startup for more details.
For the GPU devices, users need to prepare the host file and mpi, please see the mpirun Startup .
For the CPU device, users need to write a dynamic cluster startup script, please see the Dynamic Cluster Startup .
>>> import mindspore as ms >>> from mindspore.communication import init, get_rank, get_local_rank >>> ms.set_context(device_target="Ascend") >>> ms.set_auto_parallel_context(device_num=16) # 2 server, each server with 8 NPU. >>> init() >>> world_rank = get_rank() >>> local_rank = get_local_rank() >>> print("local_rank is: {}, world_rank is {}".format(local_rank, world_rank)) local_rank is: 1, world_rank is 9
- mindspore.communication.get_local_rank_size(group=GlobalComm.WORLD_COMM_GROUP)[source]
Gets local rank size of the specified collective communication group.
Note
This method isn’t supported in GPU and CPU versions of MindSpore. This method should be used after init().
- Parameters
group (str) – The communication group to work on. The group is created by create_group or the default world communication group. Default:
GlobalComm.WORLD_COMM_GROUP
.- Returns
int, the local rank size where the calling process is within the group.
- Raises
TypeError – If group is not a string.
ValueError – If backend is invalid.
RuntimeError – If HCCL is not available or MindSpore is GPU/CPU version.
- Supported Platforms:
Ascend
GPU
CPU
Examples
Note
Before running the following examples, you need to configure the communication environment variables.
For the Ascend devices, users need to prepare the rank table, set rank_id and device_id. Please see the rank table Startup for more details.
For the GPU devices, users need to prepare the host file and mpi, please see the mpirun Startup .
For the CPU device, users need to write a dynamic cluster startup script, please see the Dynamic Cluster Startup .
>>> import mindspore as ms >>> from mindspore.communication import init, get_local_rank_size >>> ms.set_context(device_target="Ascend") >>> ms.set_auto_parallel_context(device_num=16) # 2 server, each server with 8 NPU. >>> init() >>> local_rank_size = get_local_rank_size() >>> print("local_rank_size is: ", local_rank_size) local_rank_size is: 8
- mindspore.communication.get_rank(group=GlobalComm.WORLD_COMM_GROUP)[source]
Get the rank ID for the current device in the specified collective communication group.
Note
This method should be used after init().
- Parameters
group (str) – The communication group to work on. Normally, the group should be created by create_group, otherwise, using the default group. Default:
GlobalComm.WORLD_COMM_GROUP
.- Returns
int, the rank ID of the calling process within the group.
- Raises
TypeError – If group is not a string.
ValueError – If backend is invalid.
RuntimeError – If HCCL/NCCL/MCCL is not available.
- Supported Platforms:
Ascend
GPU
CPU
Examples
Note
Before running the following examples, you need to configure the communication environment variables.
For the Ascend devices, users need to prepare the rank table, set rank_id and device_id. Please see the rank table Startup for more details.
For the GPU devices, users need to prepare the host file and mpi, please see the mpirun Startup .
For the CPU device, users need to write a dynamic cluster startup script, please see the Dynamic Cluster Startup .
>>> from mindspore.communication import init, get_rank >>> init() >>> rank_id = get_rank() >>> print(rank_id) >>> # the result is the rank_id in world_group
- mindspore.communication.get_world_rank_from_group_rank(group, group_rank_id)[source]
Get the rank ID in the world communication group corresponding to the rank ID in the specified user communication group.
Note
This method isn’t supported in GPU and CPU versions of MindSpore. The parameter group should not be “hccl_world_group”. This method should be used after init().
- Parameters
- Returns
int, the rank ID in world communication group.
- Raises
TypeError – If group_rank_id is not an integer or the group is not a string.
ValueError – If group is ‘hccl_world_group’ or backend is invalid.
RuntimeError – If HCCL is not available or MindSpore is GPU/CPU version.
- Supported Platforms:
Ascend
GPU
CPU
Examples
Note
Before running the following examples, you need to configure the communication environment variables.
For the Ascend devices, users need to prepare the rank table, set rank_id and device_id. Please see the rank table Startup for more details.
For the GPU devices, users need to prepare the host file and mpi, please see the mpirun Startup
For the CPU device, users need to write a dynamic cluster startup script, please see the Dynamic Cluster Startup .
>>> import mindspore as ms >>> from mindspore import set_context >>> from mindspore.communication import init, create_group, get_world_rank_from_group_rank, get_rank >>> set_context(mode=ms.GRAPH_MODE, device_target="Ascend") >>> init() >>> group = "0-4" >>> rank_ids = [0,4] >>> if get_rank() in rank_ids: ... create_group(group, rank_ids) ... world_rank_id = get_world_rank_from_group_rank(group, 1) ... print("world_rank_id is: ", world_rank_id) world_rank_id is: 4
- mindspore.communication.init(backend_name=None)[source]
Initialize distributed backends required by communication services, e.g.
"hccl"
/"nccl"
/"mccl"
. It is usually used in distributed parallel scenarios and set before using communication services.Note
The full name of
"hccl"
is Huawei Collective Communication Library(HCCL).The full name of
"nccl"
is NVIDIA Collective Communication Library(NCCL).The full name of
"mccl"
is MindSpore Collective Communication Library(MCCL).In Ascend hardware platforms,
init()
should be set before the definition of any Tensor and Parameter, and the instantiation and execution of any operation and net.
- Parameters
backend_name (str) – Backend, using
"hccl"
/"nccl"
/"mccl"
."hccl"
should be used for Ascend hardware platforms,"nccl"
for GPU hardware platforms and"mccl"
for CPU hardware platforms. If not set, inference is automatically made based on the hardware platform type (device_target). Default:None
.- Raises
TypeError – If backend_name is not a string.
RuntimeError – If device target is invalid, or backend is invalid, or distributed initialization fails, or the environment variables RANK_ID/MINDSPORE_HCCL_CONFIG_PATH have not been exported when backend is HCCL.
- Supported Platforms:
Ascend
GPU
CPU
Examples
Note
Before running the following examples, you need to configure the communication environment variables.
For the Ascend devices, users need to prepare the rank table, set rank_id and device_id. Please see the rank table Startup for more details.
For the GPU devices, users need to prepare the host file and mpi, please see the mpirun Startup .
For the CPU device, users need to write a dynamic cluster startup script, please see the Dynamic Cluster Startup .
>>> from mindspore.communication import init >>> init()
- mindspore.communication.release()[source]
Release distributed resource. e.g. HCCL/NCCL/MCCL.
Note
This method should be used after init().
- Raises
RuntimeError – If failed to release distributed resource.
- Supported Platforms:
Ascend
GPU
CPU
Examples
Note
Before running the following examples, you need to configure the communication environment variables.
For the Ascend devices, users need to prepare the rank table, set rank_id and device_id. Please see the rank table Startup for more details.
For the GPU devices, users need to prepare the host file and mpi, please see the mpirun Startup .
For the CPU device, users need to write a dynamic cluster startup script, please see the Dynamic Cluster Startup .
>>> from mindspore.communication import init, release >>> init() >>> release()
- mindspore.communication.HCCL_WORLD_COMM_GROUP
The string of “hccl_world_group” referring to the default communication group created by HCCL. On the Ascend hardware platforms, the string is equivalent to
GlobalComm.WORLD_COMM_GROUP
after the communication service is initialized. It is recommended to useGlobalComm.WORLD_COMM_GROUP
to obtain the current global communication group.
- mindspore.communication.NCCL_WORLD_COMM_GROUP
The string of “nccl_world_group” referring to the default communication group created by NCCL. On the GPU hardware platforms, the string is equivalent to
GlobalComm.WORLD_COMM_GROUP
after the communication service is initialized. It is recommended to useGlobalComm.WORLD_COMM_GROUP
to obtain the current global communication group.
- mindspore.communication.MCCL_WORLD_COMM_GROUP
The string of “mccl_world_group” referring to the default communication group created by MCCL. On the CPU hardware platforms, the string is equivalent to
GlobalComm.WORLD_COMM_GROUP
after the communication service is initialized. It is recommended to useGlobalComm.WORLD_COMM_GROUP
to obtain the current global communication group.