mindspore.train.FlopsUtilizationCollector

class mindspore.train.FlopsUtilizationCollector(data_size, computility=1, full_flops=True, enable_ma_collector=False)[source]

The FlopsUtilizationCollector interface counts the model utilization information MFU and the hardware utilization information HFU. Currently, the API counts only the forward and backward flops of MatMul, BatchMatMul, FlashAttentionScore, and Conv2D operators. Only used in graph mode with static shape.

Parameters

data_size (int) – How many steps are the intervals between print information each time.
computility (int) – The peak flops of each compute card. Default: 1 .
full_flops (bool) – Whether to count the full model flops. If set full_flops to False, FlopsUtilizationCollector would count the shard model flops in each device. Default: True .
enable_ma_collector (bool) – Whether to write flops into the log and provide them to tasks on the cloud for retrieval. Default: False .

Raises

TypeError – If data_size is not positive int.
TypeError – If full_flops is not bool.
TypeError – If enable_ma_collector is not bool.
AssertionError – If the training mode is not a static graph or not a static shape.

Examples

>>> import numpy as np
>>> import mindspore.dataset as ds
>>> from mindspore import nn
>>> from mindspore.train import Model, FlopsUtilizationCollector
>>> from mindspore import context
>>> context.set_context(mode=context.GRAPH_MODE)
>>> data = {"x": np.float32(np.random.rand(64, 10)), "y": np.random.randint(0, 5, (64,))}
>>> train_dataset = ds.NumpySlicesDataset(data=data).batch(32)
>>> net = nn.Dense(10, 5)
>>> crit = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
>>> opt = nn.Momentum(net.trainable_params(), 0.01, 0.9)
>>> flops_callback = FlopsUtilizationCollector(train_dataset.get_dataset_size(), computility=10e6)
>>> model = Model(network=net, optimizer=opt, loss_fn=crit, metrics={"recall"})
>>> model.train(2, train_dataset, callbacks=[flops_callback])
Full model flops is 6400, Full hardware flops is 6400, Shard model flops is 6400, Shard hardware flops is 6400
Train per step time: 135.572 ms, mfu:0.47% hfu:0.47%
Train per step time: 1.317 ms, mfu:48.59% hfu:48.59%

step_begin(run_context)[source]

Record time at the beginning of step.

Parameters: run_context (RunContext) – Context of the process running. For more details, please refer to mindspore.train.RunContext.

step_end(run_context)[source]

Print mfu and hfu time at the end of step.

Parameters: run_context (RunContext) – Context of the process running. For more details, please refer to mindspore.train.RunContext.