Safetensors Weights

Overview

Safetensors is a reliable and portable machine learning model storage format from Huggingface for storing Tensors securely and with fast storage (zero copies). This article focuses on how MindSpore Transformers supports saving and loading of this file format to help users use weights better and faster.

Safetensors Weights Samples

There are two main types of Safetensors files: complete weights files and distributed weights files. Below are examples of how they are obtained and the corresponding files.

Complete Weights

Safetensors complete weights can be obtained in two ways:

Download directly from Huggingface.
After MindSpore Transformers distributed training, the weights are generated by merge script.

Huggingface Safetensors example catalog structure is as follows:

qwen2_7b
 └── hf_unified_safetenosrs
        ├── model-00001-of-00004.safetensors
        ├── model-00002-of-00004.safetensors
        ├── model-00003-of-00004.safetensors
        ├── model-00004-of-00004.safetensors
        └── model.safetensors.index.json        # Huggingface weight parameter and file storage relationship mapping json file

MindSpore Safetensors example catalog structure is as follows:

qwen2_7b
 └── ms_unified_safetenosrs
        ├── model-00001-of-00004.safetensors
        ├── model-00002-of-00004.safetensors
        ├── model-00003-of-00004.safetensors
        ├── model-00004-of-00004.safetensors
        ├── hyper_param.safetensors            # Hyperparameter files for training task records
        └── param_name_map.json                # MindSpore weight parameter and file storage relationship mapping json file

Distributed Weights

Safetensors distributed weights can be obtained in two ways:

Generated by distributed training with MindSpore Transformers.
Using format conversion script, the original distributed ckpt weights are changed to the Safetensors format.

Distributed Safetensors example catalog structure is as follows:

qwen2_7b
 └── distributed_safetenosrs
        ├── rank_0
            └── qwen2_7b_rank_0.safetensors
        ├── rank_1
            └── qwen2_7b_rank_1.safetensors
        ...
        └── rank_x
            └── qwen2_7b_rank_x.safetensors

Configuration Descriptions

Load the relevant configurations:

Parameter names	Descriptions
load_checkpoint	The path to the folder where the weights are preloaded. - In case of full weights, fill in the path to the folder where the slices/individual weight files are located. Note: Huggingface safetensor weights loading is supported (currently only Llama series models are supported). During the online loading process, a copy of the converted MindSpore safetensor weights file is saved to `/output/ms_safetensors`. - In case of distributed weights, they need to be stored in `model_dir/rank_x/xxx.safetensor` format, with the folder path filled in as `model_dir`.
load_ckpt_format	The format of the loaded model weights, optionally `ckpt`, `safetensors`, defaults to `ckpt`. Loading weights in `safetensors` format needs to change this configuration to `safetensors`.
auto_trans_ckpt	Whether to enable the online slicing function. - If loading weight is full weight: a. when `use_parallel: True`, it is judged as distributed loading, `auto_trans_ckpt: True` needs to be set synchronously to turn on online slicing. b. When `use_parallel: False`, it is judged as single card loading, you need to set `auto_trans_ckpt: False` synchronously to disable the online slicing function. - If loading weight is distributed weight: a. Without changing the original slicing strategy, you need to set `auto_trans_ckpt: False` to load directly according to the original slicing strategy. b. To change the original slicing strategy, set `auto_trans_ckpt: True` and configure `src_strategy_path_or_dir` to be the original slicing strategy file path. When the task is pulled up, the weights are merged online into full weights, which are sliced and loaded according to the parallelism strategy set in the configuration file. The online merged weights are saved in the current directory under the `/output/unified_checkpoint` file.
remove_redundancy	Whether the loaded weights are de-redundant, defaults to `False`.

Save the relevant configurations:

Parameter names	Descriptions
callbacks.checkpoint_format	The format of the saved model weights, defaults to `ckpt`. Options are `ckpt` and `safetensors`.
callbacks.remove_redundancy	Whether to enable de-redundancy saving when saving weights, defaults to `False`. Only `safetensors format` is supported.

Usage Example

Examples of Pre-training Tasks

Taking Llama2-7B as an example, modify the configuration item pretrain_llama2_7b.yaml to confirm the weight saving format:

callbacks:
  - type: CheckpointMonitor
    checkpoint_format: safetensors                  # Save weights file format
    remove_redundancy: True                         # Turn on de-redundancy when saving weights

Execute the command when completed:

bash scripts/msrun_launcher.sh "run_mindformer.py \
 --config configs/llama2/pretrain_llama2_7b.yaml \
 --train_dataset_dir /{path}/wiki4096.mindrecord \
 --use_parallel True \
 --run_mode train" 8

After the task is executed, a checkpoint folder is generated in the mindformers/output directory, while the model files are saved in that folder.

For more details, please refer to: Introduction to Pre-training.

Examples of Fine-tuning Tasks

If you use the full weighted multicard online fine-tuning, take the Qwen2-7B model as an example and modify the configuration item finetune_qwen2_7b.yaml:

# Modified configuration
load_checkpoint: '/qwen2_7b/hf_unified_safetenosrs' # Load weights file path
load_ckpt_format: 'safetensors'                     # Load weights file format
auto_trans_ckpt: True                               # This configuration item needs to be turned on for complete weights to enable the online slicing feature
parallel_config:                                    # Configure the target distributed strategy
  data_parallel: 1
  model_parallel: 2
  pipeline_stage: 1
callbacks:
  - type: CheckpointMonitor
    checkpoint_format: safetensors                  # Save weights file format

If you use distributed weights multicard online fine-tuning, take the Qwen2-7B model as an example, modify the configuration item finetune_qwen2_7b.yaml:

# Modified configuration
load_checkpoint: '/qwen2_7b/distributed_safetenosrs' # Load weights file path
load_ckpt_format: 'safetensors'                      # Load weights file format
parallel_config:                                     # Configure the target distributed strategy
  data_parallel: 1
  model_parallel: 2
  pipeline_stage: 1
callbacks:
  - type: CheckpointMonitor
    checkpoint_format: safetensors                  # Save weights file format

Execute the command when completed:

bash scripts/msrun_launcher.sh "run_mindformer.py \
 --config research/qwen2/qwen2_7b/finetune_qwen2_7b.yaml \
 --train_dataset_dir /{path}/alpaca-data.mindrecord \
 --register_path research/qwen2 \
 --use_parallel True \
 --run_mode finetune" 2

After the task is executed, a checkpoint folder is generated in the mindformers/output directory, while the model files are saved in that folder.

For more details, please refer to Introduction to SFT fine-tuning

Example of an Inference Task

If you use complete weighted multicard online inference, take the Qwen2-7B model as an example, and modify the configuration item predict_qwen2_7b_instruct.yaml:

# Modified configuration
load_checkpoint: '/qwen2_7b/hf_unified_safetenosrs' # Load weights file path
load_ckpt_format: 'safetensors'                     # Load weights file format
auto_trans_ckpt: True                               # This configuration item needs to be turned on for complete weights to enable the online slicing function
parallel_config:
  data_parallel: 1
  model_parallel: 2
  pipeline_stage: 1

If you use distributed weighted multicard online inference, take the Qwen2-7B model as an example, modify the configuration item predict_qwen2_7b_instruct.yaml:

# Modified configuration
load_checkpoint: '/qwen2_7b/distributed_safetenosrs' # Load weights file path
load_ckpt_format: 'safetensors'                      # Load weights file format
parallel_config:
  data_parallel: 1
  model_parallel: 2
  pipeline_stage: 1

Execute the command when completed:

bash scripts/msrun_launcher.sh "python run_mindformer.py \
--config research/qwen2/qwen2_7b/predict_qwen2_7b_instruct.yaml \
--run_mode predict \
--use_parallel True \
--register_path research/qwen2 \
--predict_data 'I love Beijing, because'" \
2

The results of executing the above single-card inference and multi-card inference commands are as follows:

'text_generation_text': [I love Beijing, because it is a city with a long history and culture.......]

For more details, please refer to: Introduction to Inference

Examples of Resumable Training after Breakpoint Tasks

MindSpore Transformers supports step-level resumable training after breakpoint, which allows you to save a model's checkpoints during training and load the saved checkpoints to restore the previous state to continue training after a break in training.

If you use distributed weight multicard resumable training and do not change the slicing strategy, modify the configuration item and start the original training task:

# Modified configuration
load_checkpoint: '/output/checkpoint'                # Load source distributed weights file path
load_ckpt_format: 'safetensors'                      # Load weights file format
resume_training: True                                # Resumable training after breakpoint switch
callbacks:
  - type: CheckpointMonitor
    checkpoint_format: safetensors                   # Save weights file format

If the distributed weight multi-card training is renewed and the slicing strategy is changed, it is necessary to pass in the path of the source slicing strategy file and start the original training task after modifying the configuration items:

# Modified configuration
load_checkpoint: '/output/checkpoint'               # Load source distributed weights file path
src_strategy_path_or_dir: '/output/src_strategy'    # Load source strategy file for merging source distributed weights into full weights
load_ckpt_format: 'safetensors'                     # Load weights file format
auto_trans_ckpt: True                               # Enable online slicing
resume_training: True                               # Resumable training after breakpoint switch
parallel_config:                                    # Configure the target distributed strategy
  data_parallel: 2
  model_parallel: 4
  pipeline_stage: 1
callbacks:
  - type: CheckpointMonitor
    checkpoint_format: safetensors                  # Save weights file format

In large cluster scale scenarios, to avoid the online merging process taking too long to occupy the training resources, it is recommended to merge the complete weights with the original distributed weights file offline, and then pass it in. There is no need to pass in the path of the source slicing strategy file.

For more details, please refer to: Resumable Training.