Weight Format Conversion

View Source On Gitee

Overview

MindFormers provides a unified weight conversion tool that allows model weights to convert between the HuggingFace and MindFormers formats. This helps you:

  • Convert a HuggingFace weight to a MindFormers one for fine-tuning, evaluation, or inference on MindFormers.

  • Convert the weights trained or fine-tuned using MindFormers to HuggingFace weights and uses them on other frameworks.

Conversion Procedure

To perform weight conversion, clone the complete HuggingFace repository of the model to be converted locally, and execute the mindformers/convert_weight.py script. This script automatically converts the HuggingFace model weight file into a weight file applicable to MindFormers. If you want to convert a MindFormers weight to a HuggingFace one, set reversed to True.

python convert_weight.py [-h] --model MODEL [--reversed] --input_path INPUT_PATH  --output_path OUTPUT_PATH [--dtype DTYPE] [--n_head N_HEAD] [--hidden_size HIDDEN_SIZE] [--layers LAYERS] [--is_pretrain IS_PRETRAIN] [--telechat_type TELECHAT_TYPE]

Parameters

  • model: model name.

  • reversed: converts a MindFormers weight to the HuggingFace one.

  • input_path: path of the HuggingFace weight folder, which points to the downloaded weight file.

  • output_path: path for storing the MindFormers weight file after conversion.

  • dtype: weight data type after conversion.

  • n_head: takes effect only for the BLOOM model. Set this parameter to 16 when bloom_560m is used and to 32 when bloom_7.1b is used.

  • hidden_size: takes effect only for the BLOOM model. Set this parameter to 1024 when bloom_560m is used and to 4096 when bloom_7.1b is used.

  • layers: number of layers to be converted. This parameter takes effect only for the GPT2 and WizardCoder models.

  • is_pretrain: converts the pre-trained weight. This parameter takes effect only for the Swin model.

  • telechat_type: version of the TeleChat model. This parameter takes effect only for the TeleChat model.

Conversion Example

Assume that you have downloaded the Llama2 model weight and saved it in the /home/user/torch_weights path. To convert it to the MindFormers weight and save it in the /home/user/ms_weights path, run the following command:

python convert_weight.py --model llama --input_path /home/user/torch_weights --output_path /home/user/ms_weights/llama.ckpt

After the preceding steps are performed, the HuggingFace weight is successfully converted to a MindFormers weight, facilitating model training or inference on MindFormers.

Supported Models

Parameter Value

Supported models

llama

Llama2, Llama3, Llama3.1, CodeLlama

baichuan2

Baichuan2

glm-n

GLM2, GLM3, GLM3-32K, GLM4

cogvlm2

CogVLM2-Video, CogVLM2-Image

qwen

Qwen, Qwen1.5, Qwen2

qwenvl

QwenVL

internlm

InternLM

internlm2

InternLM2

yi

Yi

mixtral

Mixtral

deepseek

DeepSeekCoder, DeepSeekCoder1.5, DeepSeekV2

gpt

GPT2

whisper

Whisper

Developing Weight Conversion for Unsupported Models

  1. Add the convert_weight.py and convert_reversed.py files to the extended model directory.

  2. Compile the convert_pt_to_ms and convert_ms_to_pt weight conversion functions in the files. The function parameters are input_path, output_path, dtype, and an additional parameter **kwargs.

  3. Add the extended model name and conversion function import paths to the convert_map and reversed_convert_map dictionaries in the convert_weight.py file in the MindFormers root directory.

  4. Call the parser.add_argument() method in the main function to add the additional parameter.

Example of Developing Model Weight Conversion

Llama is used as an example. To convert a HuggingFace weight to a MindFormers one, define the convert_pt_to_ms function in convert_weight.py.

def convert_pt_to_ms(input_path, output_path, dtype=None, **kwargs):
    """convert hf weight to ms."""
    print(f"Trying to convert huggingface checkpoint in '{input_path}'.", flush=True)
    try:
        from transformers import LlamaForCausalLM
    except:
        raise ImportError(f"Failed to load huggingface checkpoint. Please make sure transformers is available.")

    try:
        model_hf = LlamaForCausalLM.from_pretrained(os.path.dirname(input_path))
    except Exception as e:
        print(f"Do not find huggingface checkpoint in '{os.path.dirname(input_path)}', Error {e.message}.", flush=True)
        return False
    ckpt_list = []
    for name, value in model_hf.state_dict().items():
        name = name_replace(name)
        if name == 'norm.weight':
            name = 'norm_out.weight'
        if name[:7] == 'layers.':
            name = name[7:]

        print(f'\rprocessing parameter: {name} {value.shape}     ', end='', flush=True)
        ckpt_list.append({'name': name, 'data': pt2ms(value, dtype)})

    ms.save_checkpoint(ckpt_list, output_path)
    print(f"\rConvert huggingface checkpoint finished, the mindspore checkpoint is saved in '{output_path}'.",
          flush=True)
    return True

To convert a MindFormers weight to a HuggingFace one, define the convert_ms_to_pt function in convert_reversed.py.

def convert_ms_to_pt(input_path, output_path, dtype=None, **kwargs):
    """convert ms weight to hf."""
    print(f"Trying to convert mindspore checkpoint in '{input_path}'.", flush=True)
    model_ms = ms.load_checkpoint(input_path)

    state_dict = {}
    for name, value in model_ms.items():
        name = name_replace(name)
        print(f'\rprocessing parameter: {name} {value.shape}     ', end='', flush=True)
        if is_lora_param(name):
            name = name.replace('.tk_delta_lora_a', '.lora_A.weight')
            name = name.replace('.tk_delta_lora_b', 'lora_B.weight')
        state_dict[name] = ms2pt(value, dtype)

    torch.save(state_dict, output_path)
    print(f"\rConvert mindspore checkpoint finished, the huggingface checkpoint is saved in '{output_path}'.",
          flush=True)
    return True