MindSpore Transformers Documentation

The goal of MindSpore Transformers (also known as MindFormers) suite is to build a full-process development suite for training, fine-tuning, evaluating, inference, and deploying large models, providing the industry mainstream Transformer class of pre-trained models and SOTA downstream task applications, and covering a rich range of parallel features, with the expectation of helping users to easily realize large model training and innovative research and development.

Users can refer to Overall Architecture and Model Library to get an initial understanding of MindFormers architecture and model support. Refer to the Installation and Quick Start to get started with MindFormers.

If you have any suggestions for MindFormers, please contact us via issue and we will handle them promptly.

MindFormers supports one-click start of single/multi-card training, fine-tuning, evaluation, and inference processes for any task, which makes the execution of deep learning tasks more efficient and user-friendly by simplifying the operation, providing flexibility, and automating the process. Users can learn from the following explanatory documents:

Flexible and Easy-to-Use Personalized Configuration with MindFormers

With its powerful feature set, MindFormers provides users with flexible and easy-to-use personalized configuration options. Specifically, it comes with the following key features:

  1. Weight Format Conversion

    Provides a unified weight conversion tool that converts model weights between the formats used by HuggingFace and MindFormers.

  2. Distributed Weight Slicing and Merging

    Weights in different distributed scenarios are flexibly sliced and merged.

  3. Distributed Parallel

    One-click configuration of multi-dimensional hybrid distributed parallel allows models to run efficiently in clusters up to 10,000 cards.

  4. Dataset

    Support multiple forms of datasets.

  5. Weight Saving and Resumable Training After Breakpoint

    Supports step-level resumable training after breakpoint, effectively reducing the waste of time and resources caused by unexpected interruptions during large-scale training.

Deep Optimizing with MindFormers

Appendix

FAQ