Models
The current list of models supported by MindFormers is as follows:
Models | Parameters | Sequence | Pretrain | Finetune | Predict | Finetune Performance(Configuration/Hardware) | Predict Performance(Configuration/Hardware) |
---|---|---|---|---|---|---|---|
LLama2 | 7B | 4K | pretrain | finetune | predict | 4160 tokens/s/p Configuration Atlas 800T A2 |
332 tokens/s Configuration Atlas 800T A2 |
13B | 4K | pretrain | finetune | predict | 1691 tokens/s/p Configuration Atlas 800T A2 |
420 tokens/s Configuration Atlas 800T A2 |
|
70B | 4K | pretrain | finetune | predict | 337 tokens/s/p Configuration Atlas 800T A2 |
522 tokens/s Configuration Atlas 800T A2 |
|
LLama3 | 8B | 8K | pretrain | finetune | predict | 2581 tokens/s/p Configuration Atlas 800T A2 |
- |
70B | 8K | pretrain | finetune | predict | 337 tokens/s/p Configuration Atlas 900 A2 PoDc |
- | |
LLama3.1 | 8B | 8K | - | finetune | predict | 2703 tokens/s/p Configuration Atlas 900 A2 PoDc |
591 tokens/s Configuration Atlas 800T A2 |
70B | 8K | - | finetune | predict | 337 tokens/s/p Configuration Atlas 900 A2 PoDc |
509 tokens/s Configuration Atlas 800T A2 |
|
Baichuan2 | 7B | 4K | - | finetune | predict | 3164 tokens/s/p Configuration Atlas 800T A2 |
521 tokens/s Configuration Atlas 800T A2 |
13B | 4K | - | finetune | predict | 1465 tokens/s/p Configuration Atlas 800T A2 |
224 tokens/s Configuration Atlas 800T A2 |
|
GLM2 | 6B | 2K | - | finetune | predict | 815.2059134 tokens/s/p Configuration Atlas 800T A2 |
32.08 tokens/s (seq_length=512) Configuration Atlas 800T A2 |
GLM3 | 6B | 2K | - | finetune | predict | 3450 tokens/s/p Configuration Atlas 800T A2 |
627 tokens/s Configuration Atlas 800T A2 |
GLM3-32K | 6B | 32K | - | finetune | predict | 3450 tokens/s/p Configuration Atlas 800T A2 |
627 tokens/s Configuration Atlas 800T A2 |
GLM4 | 9B | 8K | - | finetune | predict | 2339 tokens/s/p Configuration Atlas 900 A2 PoDc |
256 tokens/s Configuration Atlas 800T A2 |
CogVLM2-Video | 13B | 2K | - | finetune | predict | - | - |
CogVLM2-Image | 19B | 4K | - | - | predict | - | - |
Qwen | 7B | 8K | - | finetune | predict | 2955 tokens/s/p Configuration Atlas 800T A2 |
23 tokens/s Configuration Atlas 800T A2 |
14B | 8K | - | finetune | predict | 1106 tokens/s/p Configuration Atlas 800T A2 |
35 tokens/s Configuration Atlas 800T A2 |
|
Qwen1.5 | 7B | 32K | pretrain | finetune | predict | 2684 tokens/s/p Configuration Atlas 800T A2 |
164 tokens/s Configuration Atlas 800T A2 |
14B | 32K | pretrain | finetune | predict | 1452 tokens/s/p Configuration Atlas 800T A2 |
104 tokens/s Configuration Atlas 800T A2 |
|
72B | 32K | pretrain | finetune | predict | - | 74 tokens/s Configuration Atlas 800T A2 |
|
Qwen2 | 0.5B | 32K | - | finetune | predict | 9555 tokens/s/p Configuration Atlas 900 A2 PoDc |
1907 tokens/s Configuration Atlas 800T A2 |
1.5B | 32K | - | finetune | predict | 4363 tokens/s/p Configuration Atlas 900 A2 PoDc |
1160 tokens/s Configuration Atlas 800T A2 |
|
7B | 32K | - | finetune | predict | - | 645 tokens/s Configuration Atlas 800T A2 |
|
57B-A14B | 8K | - | finetune | predict | 288 tokens/s/p Configuration Atlas 900 A2 PoDc |
- | |
57B | 32K | - | finetune | predict | - | - | |
72B | 128K | - | finetune | predict | 2026 tokens/s/p Configuration Atlas 900 A2 PoDc |
252 tokens/s Configuration Atlas 800T A2 |
|
Qwen-VL | 9.6B | 2K | - | finetune | predict | 2587 tokens/s/p Configuration - |
42 tokens/s Configuration - |
InternLM | 7B | 2K | - | finetune | predict | 3250 tokens/s/p Configuration Atlas 800T A2 |
62 tokens/s Configuration Atlas 800T A2 |
20B | 2K | - | finetune | predict | - | 296 tokens/s Configuration Atlas 800T A2 |
|
InternLM2 | 7B | 2K | - | finetune | predict | - | - |
20B | 4K | - | finetune | predict | - | - | |
Yi | 6B | 2K | pretrain | finetune | predict | 3324 tokens/s/p Configuration Atlas 800T A2 |
31 tokens/s Configuration Atlas 800T A2 |
34B | 4K | pretrain | finetune | predict | 660 tokens/s/p Configuration Atlas 800T A2 |
41 tokens/s Configuration Atlas 800T A2 |
|
Mixtral | 8x7B | 32K | pretrain | finetune | predict | - | - |
DeepSeek Coder | 33B | 4K | pretrain | finetune | predict | 572 tokens/s/p Configuration Atlas 900 A2 PoDc |
292 tokens/s Configuration Atlas 800T A2 |
DeepSeek Coder1.5 | 7B | 4K | - | finetune | predict | 340 tokens/s/p Configuration Atlas 900 A2 PoDc |
60 tokens/s Configuration Atlas 800T A2 |
DeepSeekV2 | 236B | 4K | - | finetune | predict | 36 tokens/s/p Configuration Atlas 900 A2 PoDc |
- |
CodeLlama | 34B | 4K | pretrain | finetune | predict | 667 tokens/s/p Configuration Atlas 800T A2 |
139 tokens/s Configuration Atlas 800T A2 |
GPT2 | 13B | 2K | pretrain | finetune | predict | 1376 tokens/s/p Configuration Atlas 800T A2 |
21 tokens/s Configuration Atlas 800T A2 |
Whisper | 1.5B | - | - | finetune | - | - | - |