mindspore.transform_checkpoints
- mindspore.transform_checkpoints(src_checkpoints_dir, dst_checkpoints_dir, ckpt_prefix, src_strategy_file=None, dst_strategy_file=None)[source]
Transform distributed checkpoint from source sharding strategy to destination sharding strategy for a rank. For more details about converting distributed Checkpoint, please refer to Distributed Resilience Training and Inference.
Note
The src_checkpoints_dir directory structure should be organized like “src_checkpoints_dir/rank_0/a.ckpt”, the rank number should be set to a subdirectory and the checkpoint file is stored in this subdirectory. If multiple files exist in a rank directory, the last file in the lexicgraphic order would be selected.
- Parameters
src_checkpoints_dir (str) – The source checkpoints directory.
dst_checkpoints_dir (str) – The destination checkpoints directory to save the converted checkpoints.
ckpt_prefix (str) – The destination checkpoint name prefix.
src_strategy_file (str) – Name of source sharding strategy file which saved by ‘mindspore.set_auto_parallel_context(strategy_ckpt_save_file)’. when the ‘src_strategy_file’ is None, it means that the source sharding strategy is without any sharing for each parameter. Default:None.
dst_strategy_file (str) – Name of destination sharding strategy file which saved by ‘mindspore.set_auto_parallel_context(strategy_ckpt_save_file)’. when the ‘dst_strategy_file’ is None, it means that the destination sharding strategy is without any sharing for each parameter. Default:None.
- Raises
ValueError – src_strategy_file or dst_strategy_file is incorrect.
NotADirectoryError – src_checkpoints_dir or dst_checkpoints_dir is not a directory.
ValueError – The checkpoint file is missing in src_checkpoints_dir.
TypeError – src_strategy_file or dst_strategy_file is not a string.
Examples
>>> import mindspore as ms >>> ms.transform_checkpoints(src_checkpoints_dir, dst_checkpoints_dir, "dst_checkpoint", ... "./src_strategy.ckpt", "./dst_strategy.ckpt")