MindSpore Transformers Documentation

The goal of the MindSpore Transformers suite is to build a full-process development suite for Large model pre-training, fine-tuning, inference, and deployment. It provides mainstream Transformer-based Large Language Models (LLMs) and Multimodal Models (MMs). It is expected to help users easily realize the full process of large model development.

Based on MindSpore's built-in parallel technology and component-based design, the MindSpore Transformers suite has the following features:

  • One-click initiation of single or multi card pre-training, fine-tuning, inference, and deployment processes for large models;

  • Provides rich multi-dimensional hybrid parallel capabilities for flexible and easy-to-use personalized configuration;

  • System-level deep optimization on large model training and inference, native support for ultra-large-scale cluster efficient training and inference, rapid fault recovery;

  • Support for configurable development of task components. Any module can be enabled by unified configuration, including model network, optimizer, learning rate policy, etc.;

  • Provide real-time visualization of training accuracy/performance monitoring indicators.

Users can refer to Overall Architecture and Model Library to get a quick overview of the MindSpore Transformers system architecture, and the list of supported foundation models.

If you have any suggestions for MindSpore Transformers, please contact us via issue and we will handle them promptly.

Full-process Developing with MindSpore Transformers

MindSpore Transformers supports one-click start of single/multi-card training, fine-tuning, and inference processes for any task, which makes the execution of deep learning tasks more efficient and user-friendly by simplifying the operation, providing flexibility, and automating the process. Users can learn from the following explanatory documents:

Code repository address: <https://gitee.com/mindspore/mindformers>

Features description of MindSpore Transformers

MindSpore Transformers provides a wealth of features throughout the full-process of large model development. Users can learn about these features via the following links:

  • General Features:

    • Start Tasks

      One-click start for single-device, single-node and multi-node tasks.

    • Ckpt Weights

      Supports conversion, slice and merge weight files in ckpt format.

    • Safetensors Weights

      Supports saving and loading weight files in safetensors format.

    • Configuration File

      Supports the use of YAML files to centrally manage and adjust configurable items in tasks.

    • Loading Hugging Face Model Configurations

      Supports plug-and-play loading of Hugging Face community model configurations for seamless integration.

    • Logging

      Introduction of logs, including log structure, log saving, and so on.

    • Using Tokenizer

      Introduction of tokenizer, supports the Hugging Face Tokenizer for use in reasoning and datasets.

  • Training Features:

    • Dataset

      Supports multiple types and formats of datasets.

    • Model Training Hyperparameters

      Flexibly configure hyperparameter settings for large model training.

    • Training Metrics Monitoring

      Provides visualization services for the training phase of large models for monitoring and analyzing various indicators and information during the training process.

    • Resumable Training After Breakpoint

      Supports step-level resumable training after breakpoint, effectively reducing the waste of time and resources caused by unexpected interruptions during large-scale training.

    • Training High Availability (Beta)

      Provides high-availability capabilities for the training phase of large models, including end-of-life CKPT preservation, UCE fault-tolerant recovery, and process-level rescheduling recovery (Beta feature).

    • Parallel Training

      One-click configuration of multi-dimensional hybrid distributed parallel allows models to run efficiently in clusters up to 10,000 cards.

    • Training Memory Optimization

      Supports fine-grained recomputation and activations swap, to reduce peak memory overhead during model training.

    • Other Training Features

      Supports gradient accumulation and gradient clipping, etc.

  • Inference Features:

    • Evaluation

      Supports the use of third-party open-source evaluation frameworks and datasets for large-scale model ranking evaluations.

    • Quantization

      Integrated MindSpore Golden Stick toolkit to provides a unified quantization inference process.

Advanced developing with MindSpore Transformers

Environment Variables

Contribution Guide

FAQ