Distributed Parallel Startup Methods

View Source on Gitee

Startup Method

Currently GPU, Ascend and CPU support multiple startup methods respectively, three of which are dynamic cluster, mpirun and rank table:

  • Dynamic cluster: this method does not rely on third-party libraries, has disaster recovery function, good security, and supports three hardware platforms. It is recommended that users prioritize the use of this startup method.

  • mpirun: this method relies on the open source library OpenMPI, and startup command is simple. Multi-machine need to ensure two-by-two password-free login. It is recommended for users who have experience in using OpenMPI to use this startup method.

  • rank table: this method requires the Ascend hardware platform and does not rely on third-party library. After manually configuring the rank_table file, you can start the parallel program via a script, and the script is consistent across multiple machines for easy batch deployment.

The hardware support for the three startup methods is shown in the table below:

GPU

Ascend

CPU

Dynamic cluster

Support

Support

Support

mpirun

Support

Support

Not support

rank table

Not support

Support

Not support