Safetensors Weights
Overview
Safetensors is a reliable and portable machine learning model storage format from Huggingface for storing Tensors securely and with fast storage (zero copies). This article focuses on how MindSpore Transformers supports saving and loading of this file format to help users use weights better and faster.
Safetensors Weights Samples
There are two main types of Safetensors files: complete weights files and distributed weights files. Below are examples of how they are obtained and the corresponding files.
Complete Weights
Safetensors complete weights can be obtained in two ways:
Download directly from Huggingface.
After MindSpore Transformers distributed training, the weights are generated by merge script.
Huggingface Safetensors example catalog structure is as follows:
qwen2_7b
└── hf_unified_safetenosrs
├── model-00001-of-00004.safetensors
├── model-00002-of-00004.safetensors
├── model-00003-of-00004.safetensors
├── model-00004-of-00004.safetensors
└── model.safetensors.index.json # Huggingface weight parameter and file storage relationship mapping json file
MindSpore Safetensors example catalog structure is as follows:
qwen2_7b
└── ms_unified_safetenosrs
├── model-00001-of-00004.safetensors
├── model-00002-of-00004.safetensors
├── model-00003-of-00004.safetensors
├── model-00004-of-00004.safetensors
├── hyper_param.safetensors # Hyperparameter files for training task records
└── param_name_map.json # MindSpore weight parameter and file storage relationship mapping json file
Distributed Weights
Safetensors distributed weights can be obtained in two ways:
Generated by distributed training with MindSpore Transformers.
Using format conversion script, the original distributed ckpt weights are changed to the Safetensors format.
Distributed Safetensors example catalog structure is as follows:
qwen2_7b
└── distributed_safetenosrs
├── rank_0
└── qwen2_7b_rank_0.safetensors
├── rank_1
└── qwen2_7b_rank_1.safetensors
...
└── rank_x
└── qwen2_7b_rank_x.safetensors
Configuration Descriptions
Load the relevant configurations:
Parameter names |
Descriptions |
---|---|
load_checkpoint |
The path to the folder where the weights are preloaded. |
load_ckpt_format |
The format of the loaded model weights, optionally |
auto_trans_ckpt |
Whether to enable the online slicing function. |
remove_redundancy |
Whether the loaded weights are de-redundant, defaults to |
Save the relevant configurations:
Parameter names |
Descriptions |
---|---|
callbacks.checkpoint_format |
The format of the saved model weights, defaults to |
callbacks.remove_redundancy |
Whether to enable de-redundancy saving when saving weights, defaults to |
Usage Example
Examples of Pre-training Tasks
Taking Llama2-7B as an example, modify the configuration item pretrain_llama2_7b.yaml to confirm the weight saving format:
callbacks:
- type: CheckpointMonitor
checkpoint_format: safetensors # Save weights file format
remove_redundancy: True # Turn on de-redundancy when saving weights
Execute the command when completed:
bash scripts/msrun_launcher.sh "run_mindformer.py \
--config configs/llama2/pretrain_llama2_7b.yaml \
--train_dataset_dir /{path}/wiki4096.mindrecord \
--use_parallel True \
--run_mode train" 8
After the task is executed, a checkpoint folder is generated in the mindformers/output directory, while the model files are saved in that folder.
For more details, please refer to: Introduction to Pre-training.
Examples of Fine-tuning Tasks
If you use the full weighted multicard online fine-tuning, take the Qwen2-7B model as an example and modify the configuration item finetune_qwen2_7b.yaml:
# Modified configuration
load_checkpoint: '/qwen2_7b/hf_unified_safetenosrs' # Load weights file path
load_ckpt_format: 'safetensors' # Load weights file format
auto_trans_ckpt: True # This configuration item needs to be turned on for complete weights to enable the online slicing feature
parallel_config: # Configure the target distributed strategy
data_parallel: 1
model_parallel: 2
pipeline_stage: 1
callbacks:
- type: CheckpointMonitor
checkpoint_format: safetensors # Save weights file format
If you use distributed weights multicard online fine-tuning, take the Qwen2-7B model as an example, modify the configuration item finetune_qwen2_7b.yaml:
# Modified configuration
load_checkpoint: '/qwen2_7b/distributed_safetenosrs' # Load weights file path
load_ckpt_format: 'safetensors' # Load weights file format
parallel_config: # Configure the target distributed strategy
data_parallel: 1
model_parallel: 2
pipeline_stage: 1
callbacks:
- type: CheckpointMonitor
checkpoint_format: safetensors # Save weights file format
Execute the command when completed:
bash scripts/msrun_launcher.sh "run_mindformer.py \
--config research/qwen2/qwen2_7b/finetune_qwen2_7b.yaml \
--train_dataset_dir /{path}/alpaca-data.mindrecord \
--register_path research/qwen2 \
--use_parallel True \
--run_mode finetune" 2
After the task is executed, a checkpoint folder is generated in the mindformers/output directory, while the model files are saved in that folder.
For more details, please refer to Introduction to SFT fine-tuning
Example of an Inference Task
If you use complete weighted multicard online inference, take the Qwen2-7B model as an example, and modify the configuration item predict_qwen2_7b_instruct.yaml:
# Modified configuration
load_checkpoint: '/qwen2_7b/hf_unified_safetenosrs' # Load weights file path
load_ckpt_format: 'safetensors' # Load weights file format
auto_trans_ckpt: True # This configuration item needs to be turned on for complete weights to enable the online slicing function
parallel_config:
data_parallel: 1
model_parallel: 2
pipeline_stage: 1
If you use distributed weighted multicard online inference, take the Qwen2-7B model as an example, modify the configuration item predict_qwen2_7b_instruct.yaml:
# Modified configuration
load_checkpoint: '/qwen2_7b/distributed_safetenosrs' # Load weights file path
load_ckpt_format: 'safetensors' # Load weights file format
parallel_config:
data_parallel: 1
model_parallel: 2
pipeline_stage: 1
Execute the command when completed:
bash scripts/msrun_launcher.sh "python run_mindformer.py \
--config research/qwen2/qwen2_7b/predict_qwen2_7b_instruct.yaml \
--run_mode predict \
--use_parallel True \
--register_path research/qwen2 \
--predict_data 'I love Beijing, because'" \
2
The results of executing the above single-card inference and multi-card inference commands are as follows:
'text_generation_text': [I love Beijing, because it is a city with a long history and culture.......]
For more details, please refer to: Introduction to Inference
Examples of Resumable Training after Breakpoint Tasks
MindSpore Transformers supports step-level resumable training after breakpoint, which allows you to save a model's checkpoints during training and load the saved checkpoints to restore the previous state to continue training after a break in training.
If you use distributed weight multicard resumable training and do not change the slicing strategy, modify the configuration item and start the original training task:
# Modified configuration
load_checkpoint: '/output/checkpoint' # Load source distributed weights file path
load_ckpt_format: 'safetensors' # Load weights file format
resume_training: True # Resumable training after breakpoint switch
callbacks:
- type: CheckpointMonitor
checkpoint_format: safetensors # Save weights file format
If the distributed weight multi-card training is renewed and the slicing strategy is changed, it is necessary to pass in the path of the source slicing strategy file and start the original training task after modifying the configuration items:
# Modified configuration
load_checkpoint: '/output/checkpoint' # Load source distributed weights file path
src_strategy_path_or_dir: '/output/src_strategy' # Load source strategy file for merging source distributed weights into full weights
load_ckpt_format: 'safetensors' # Load weights file format
auto_trans_ckpt: True # Enable online slicing
resume_training: True # Resumable training after breakpoint switch
parallel_config: # Configure the target distributed strategy
data_parallel: 2
model_parallel: 4
pipeline_stage: 1
callbacks:
- type: CheckpointMonitor
checkpoint_format: safetensors # Save weights file format
In large cluster scale scenarios, to avoid the online merging process taking too long to occupy the training resources, it is recommended to merge the complete weights with the original distributed weights file offline, and then pass it in. There is no need to pass in the path of the source slicing strategy file.
For more details, please refer to: Resumable Training.