MindSpore Transformers Documentation

MindSpore Transformers (also known as MindFormers) is a MindSpore-native foundation model suite designed to provide full-flow development capabilities for foundation model training, fine-tuning, evaluating, inference and deploying, providing the industry mainstream Transformer class of pre-trained models and SOTA downstream task applications, and covering a rich range of parallel features, with the expectation of helping users to easily realize large model training and innovative research and development.

Users can refer to Overall Architecture and Model Library to get a quick overview of the MindSpore Transformers system architecture, and the list of supported functional features and foundation models. Further, refer to the Installation and Quick Start to get started with MindSpore Transformers.

If you have any suggestions for MindSpore Transformers, please contact us via issue and we will handle them promptly.

MindSpore Transformers supports one-click start of single/multi-card training, fine-tuning, evaluation, and inference processes for any task, which makes the execution of deep learning tasks more efficient and user-friendly by simplifying the operation, providing flexibility, and automating the process. Users can learn from the following explanatory documents:

Code repository address: <https://gitee.com/mindspore/mindformers>

Flexible and Easy-to-Use Personalized Configuration with MindSpore Transformers

With its powerful feature set, MindSpore Transformers provides users with flexible and easy-to-use personalized configuration options. Specifically, it comes with the following key features:

  1. Weight Format Conversion

    Provides a unified weight conversion tool that converts model weights between the formats used by HuggingFace and MindSpore Transformers.

  2. Distributed Weight Slicing and Merging

    Weights in different distributed scenarios are flexibly sliced and merged.

  3. Distributed Parallel

    One-click configuration of multi-dimensional hybrid distributed parallel allows models to run efficiently in clusters up to 10,000 cards.

  4. Dataset

    Support multiple types and formats of datasets.

  5. Weight Saving and Resumable Training After Breakpoint

    Supports step-level resumable training after breakpoint, effectively reducing the waste of time and resources caused by unexpected interruptions during large-scale training.

  6. Training Metrics Monitoring

    Provides visualization services for the training phase of large models for monitoring and analyzing various indicators and information during the training process.

  7. Training High Availability

    Provide high-availability capabilities for the training phase of large models, including end-of-life CKPT preservation, UCE fault-tolerant recovery, and process-level rescheduling recovery.

  8. Safetensors Weights

    Support the function of saving and loading weight files in safetensors format.

  9. Fine-Grained Activations SWAP

    Support fine-grained selection of specific activations to enable SWAP and reduce peak memory overhead during model training.

Deep Optimizing with MindSpore Transformers

Appendix

FAQ