mindspore.train.FlopsUtilizationCollector
- class mindspore.train.FlopsUtilizationCollector(data_size, computility=1, full_flops=True, enable_ma_collector=False)[source]
The FlopsUtilizationCollector interface counts the model utilization information MFU and the hardware utilization information HFU. Currently, the API counts only the forward and backward flops of MatMul, BatchMatMul, flash_attention_score, and Conv2D operators. Only used in graph mode with static shape.
- Parameters
data_size (int) – How many steps are the intervals between print information each time.
computility (int) – The peak flops of each compute card. Default:
1
.full_flops (bool) – Whether to count the full model flops. If set full_flops to False, FlopsUtilizationCollector would count the shard model flops in each device. Default:
True
.enable_ma_collector (bool) – Whether to write flops into the log and provide them to tasks on the cloud for retrieval. Default:
False
.
- Raises
TypeError – If data_size is not positive int.
TypeError – If full_flops is not bool.
TypeError – If enable_ma_collector is not bool.
AssertionError – If the training mode is not a static graph or not a static shape.
Examples
>>> import numpy as np >>> import mindspore.dataset as ds >>> from mindspore import nn >>> from mindspore.train import Model, FlopsUtilizationCollector >>> from mindspore import context >>> context.set_context(mode=context.GRAPH_MODE) >>> data = {"x": np.float32(np.random.rand(64, 10)), "y": np.random.randint(0, 5, (64,))} >>> train_dataset = ds.NumpySlicesDataset(data=data).batch(32) >>> net = nn.Dense(10, 5) >>> crit = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean') >>> opt = nn.Momentum(net.trainable_params(), 0.01, 0.9) >>> flops_callback = FlopsUtilizationCollector(train_dataset.get_dataset_size(), computility=10e6) >>> model = Model(network=net, optimizer=opt, loss_fn=crit, metrics={"recall"}) >>> model.train(2, train_dataset, callbacks=[flops_callback]) Full model flops is 6400, Full hardware flops is 6400, Shard model flops is 6400, Shard hardware flops is 6400 Train per step time: 135.572 ms, mfu:0.47% hfu:0.47% Train per step time: 1.317 ms, mfu:48.59% hfu:48.59%
- step_begin(run_context)[source]
Record time at the beginning of step.
- Parameters
run_context (RunContext) – Context of the process running. For more details, please refer to
mindspore.train.RunContext
.
- step_end(run_context)[source]
Print mfu and hfu time at the end of step.
- Parameters
run_context (RunContext) – Context of the process running. For more details, please refer to
mindspore.train.RunContext
.