MindSpore Transformers Documentation
MindSpore Transformers (also known as MindFormers) is a MindSpore-native foundation model suite designed to provide full-flow development capabilities for foundation model training, fine-tuning, evaluating, inference and deploying, providing the industry mainstream Transformer class of pre-trained models and SOTA downstream task applications, and covering a rich range of parallel features, with the expectation of helping users to easily realize large model training and innovative research and development.
Users can refer to Overall Architecture and Model Library to get a quick overview of the MindSpore Transformers system architecture, and the list of supported functional features and foundation models. Further, refer to the Installation and Quick Start to get started with MindSpore Transformers.
If you have any suggestions for MindSpore Transformers, please contact us via issue and we will handle them promptly.
MindSpore Transformers supports one-click start of single/multi-card training, fine-tuning, evaluation, and inference processes for any task, which makes the execution of deep learning tasks more efficient and user-friendly by simplifying the operation, providing flexibility, and automating the process. Users can learn from the following explanatory documents:
Code repository address: <https://gitee.com/mindspore/mindformers>
Flexible and Easy-to-Use Personalized Configuration with MindSpore Transformers
With its powerful feature set, MindSpore Transformers provides users with flexible and easy-to-use personalized configuration options. Specifically, it comes with the following key features:
-
Provides a unified weight conversion tool that converts model weights between the formats used by HuggingFace and MindSpore Transformers.
Distributed Weight Slicing and Merging
Weights in different distributed scenarios are flexibly sliced and merged.
-
One-click configuration of multi-dimensional hybrid distributed parallel allows models to run efficiently in clusters up to 10,000 cards.
-
Support multiple types and formats of datasets.
Weight Saving and Resumable Training After Breakpoint
Supports step-level resumable training after breakpoint, effectively reducing the waste of time and resources caused by unexpected interruptions during large-scale training.
-
Provides visualization services for the training phase of large models for monitoring and analyzing various indicators and information during the training process.
-
Provide high-availability capabilities for the training phase of large models, including end-of-life CKPT preservation, UCE fault-tolerant recovery, and process-level rescheduling recovery.
-
Support the function of saving and loading weight files in safetensors format.
-
Support fine-grained selection of specific activations to enable SWAP and reduce peak memory overhead during model training.