mindspore.train.FlopsUtilizationCollector

View Source On Gitee
class mindspore.train.FlopsUtilizationCollector(data_size, computility=1, full_flops=True, enable_ma_collector=False)[source]

The FlopsUtilizationCollector interface counts the model utilization information MFU and the hardware utilization information HFU. Currently, the API counts only the forward and backward flops of MatMul, BatchMatMul, flash_attention_score, and Conv2D operators. Only used in graph mode with static shape.

Parameters
  • data_size (int) – How many steps are the intervals between print information each time.

  • computility (int) – The peak flops of each compute card. Default: 1 .

  • full_flops (bool) – Whether to count the full model flops. If set full_flops to False, FlopsUtilizationCollector would count the shard model flops in each device. Default: True .

  • enable_ma_collector (bool) – Whether to write flops into the log and provide them to tasks on the cloud for retrieval. Default: False .

Raises
  • TypeError – If data_size is not positive int.

  • TypeError – If full_flops is not bool.

  • TypeError – If enable_ma_collector is not bool.

  • AssertionError – If the training mode is not a static graph or not a static shape.

Examples

>>> import numpy as np
>>> import mindspore.dataset as ds
>>> from mindspore import nn
>>> from mindspore.train import Model, FlopsUtilizationCollector
>>> from mindspore import context
>>> context.set_context(mode=context.GRAPH_MODE)
>>> data = {"x": np.float32(np.random.rand(64, 10)), "y": np.random.randint(0, 5, (64,))}
>>> train_dataset = ds.NumpySlicesDataset(data=data).batch(32)
>>> net = nn.Dense(10, 5)
>>> crit = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
>>> opt = nn.Momentum(net.trainable_params(), 0.01, 0.9)
>>> flops_callback = FlopsUtilizationCollector(train_dataset.get_dataset_size(), computility=10e6)
>>> model = Model(network=net, optimizer=opt, loss_fn=crit, metrics={"recall"})
>>> model.train(2, train_dataset, callbacks=[flops_callback])
Full model flops is 6400, Full hardware flops is 6400, Shard model flops is 6400, Shard hardware flops is 6400
Train per step time: 135.572 ms, mfu:0.47% hfu:0.47%
Train per step time: 1.317 ms, mfu:48.59% hfu:48.59%
step_begin(run_context)[source]

Record time at the beginning of step.

Parameters

run_context (RunContext) – Context of the process running. For more details, please refer to mindspore.train.RunContext.

step_end(run_context)[source]

Print mfu and hfu time at the end of step.

Parameters

run_context (RunContext) – Context of the process running. For more details, please refer to mindspore.train.RunContext.