Benchmarks

Linux Ascend Model Training Intermediae Expert

View Source On Gitee

This document describes the MindSpore benchmarks. For details about the MindSpore networks, see Model Zoo.

Training Performance

ResNet

Network

Network Type

Dataset

MindSpore Version

Resource                

Precision

Batch Size

Throughput

Speedup

ResNet-50 v1.5

CNN

ImageNet2012

0.5.0-beta

Ascend: 1 * Ascend 910
CPU: 24 Cores

Mixed

256

2115 images/sec

-

Ascend: 8 * Ascend 910
CPU: 192 Cores

Mixed

256

16600 images/sec

0.98

Ascend: 16 * Ascend 910
CPU: 384 Cores

Mixed

256

32768 images/sec

0.96

  1. The preceding performance is obtained based on ModelArts, the HUAWEI CLOUD AI development platform. It is the average performance obtained by the Ascend 910 AI processor during the overall training process.

  2. For details about other open source frameworks, see ResNet-50 v1.5 for TensorFlow.

BERT

Network

Network Type

Dataset

MindSpore Version

Resource                

Precision

Batch Size

Throughput

Speedup

BERT-Large

Attention

zhwiki

0.5.0-beta

Ascend: 1 * Ascend 910
CPU: 24 Cores

Mixed

96

269 sentences/sec

-

Ascend: 8 * Ascend 910
CPU: 192 Cores

Mixed

96

2069 sentences/sec

0.96

  1. The preceding performance is obtained based on ModelArts, the HUAWEI CLOUD AI development platform. The network contains 24 hidden layers, the sequence length is 128 tokens, and the vocabulary contains 21128 tokens.

  2. For details about other open source frameworks, see BERT For TensorFlow.

Wide & Deep (data parallel)

Network

Network Type

Dataset

MindSpore Version

Resource                

Precision

Batch Size

Throughput

Speedup

Wide & Deep

Recommend

Criteo

0.6.0-beta

Ascend: 1 * Ascend 910
CPU: 24 Cores

Mixed

16000

796892 samples/sec

-

Ascend: 8 * Ascend 910
CPU: 192 Cores

Mixed

16000*8

4872849 samples/sec

0.76

  1. The preceding performance is obtained based on Atlas 800, and the model is data parallel.

  2. For details about other open source frameworks, see Wide & Deep For TensorFlow.

Wide & Deep (Host-Device model parallel)

Network

Network Type

Dataset

MindSpore Version

Resource                

Precision

Batch Size

Throughput

Speedup

Wide & Deep

Recommend

Criteo

0.6.0-beta

Ascend: 1 * Ascend 910
CPU: 24 Cores

Mixed

1000

68715 samples/sec

-

Ascend: 8 * Ascend 910
CPU: 192 Cores

Mixed

8000*8

283830 samples/sec

0.51

Ascend: 16 * Ascend 910
CPU: 384 Cores

Mixed

8000*16

377848 samples/sec

0.34

Ascend: 32 * Ascend 910
CPU: 768 Cores

Mixed

8000*32

433423 samples/sec

0.20

  1. The preceding performance is obtained based on Atlas 800, and the model is model parallel.

  2. For details about other open source frameworks, see Wide & Deep For TensorFlow.