MindSpore Transformers Documentation
The goal of the MindSpore Transformers suite is to build a full-process development suite for Large model pre-training, fine-tuning, inference, and deployment. It provides mainstream Transformer-based Large Language Models (LLMs) and Multimodal Models (MMs). It is expected to help users easily realize the full process of large model development.
Based on MindSpore's built-in parallel technology and component-based design, the MindSpore Transformers suite has the following features:
One-click initiation of single or multi card pre-training, fine-tuning, inference, and deployment processes for large models;
Provides rich multi-dimensional hybrid parallel capabilities for flexible and easy-to-use personalized configuration;
System-level deep optimization on large model training and inference, native support for ultra-large-scale cluster efficient training and inference, rapid fault recovery;
Support for configurable development of task components. Any module can be enabled by unified configuration, including model network, optimizer, learning rate policy, etc.;
Provide real-time visualization of training accuracy/performance monitoring indicators.
Users can refer to Overall Architecture and Model Library to get a quick overview of the MindSpore Transformers system architecture, and the list of supported foundation models.
If you have any suggestions for MindSpore Transformers, please contact us via issue and we will handle them promptly.
Full-process Developing with MindSpore Transformers
MindSpore Transformers supports one-click start of single/multi-card training, fine-tuning, and inference processes for any task, which makes the execution of deep learning tasks more efficient and user-friendly by simplifying the operation, providing flexibility, and automating the process. Users can learn from the following explanatory documents:
Code repository address: <https://gitee.com/mindspore/mindformers>
Features description of MindSpore Transformers
MindSpore Transformers provides a wealth of features throughout the full-process of large model development. Users can learn about these features via the following links:
General Features:
-
One-click start for single-device, single-node and multi-node tasks.
-
Supports conversion, slice and merge weight files in ckpt format.
-
Supports saving and loading weight files in safetensors format.
-
Supports the use of YAML files to centrally manage and adjust configurable items in tasks.
Loading Hugging Face Model Configurations
Supports plug-and-play loading of Hugging Face community model configurations for seamless integration.
-
Introduction of logs, including log structure, log saving, and so on.
-
Introduction of tokenizer, supports the Hugging Face Tokenizer for use in reasoning and datasets.
-
Training Features:
-
Supports multiple types and formats of datasets.
Model Training Hyperparameters
Flexibly configure hyperparameter settings for large model training.
-
Provides visualization services for the training phase of large models for monitoring and analyzing various indicators and information during the training process.
Resumable Training After Breakpoint
Supports step-level resumable training after breakpoint, effectively reducing the waste of time and resources caused by unexpected interruptions during large-scale training.
Training High Availability (Beta)
Provides high-availability capabilities for the training phase of large models, including end-of-life CKPT preservation, UCE fault-tolerant recovery, and process-level rescheduling recovery (Beta feature).
-
One-click configuration of multi-dimensional hybrid distributed parallel allows models to run efficiently in clusters up to 10,000 cards.
-
Supports fine-grained recomputation and activations swap, to reduce peak memory overhead during model training.
-
Supports gradient accumulation and gradient clipping, etc.
-
Inference Features:
-
Supports the use of third-party open-source evaluation frameworks and datasets for large-scale model ranking evaluations.
-
Integrated MindSpore Golden Stick toolkit to provides a unified quantization inference process.
-
Advanced developing with MindSpore Transformers
Diagnostics and Optimization
Model Development
Accuracy Comparison