mindspore.profiler._ExperimentalConfig
- class mindspore.profiler._ExperimentalConfig(profiler_level: ProfilerLevel = ProfilerLevel.Level0, aic_metrics: AicoreMetrics = AicoreMetrics.AiCoreNone, l2_cache: bool = False, mstx: bool = False, data_simplification: bool = True, export_type: list = None)[source]
The purpose of this class is to configure scalable parameters when using profiles for model performance data acquisition.
- Parameters
profiler_level (ProfilerLevel, optional) –
(Ascend only) The level of profiling. Default:
ProfilerLevel.Level0
.ProfilerLevel.LevelNone: This setting takes effect only when mstx is enabled, indicating that no operator data is collected on the device side.
ProfilerLevel.Level0: Leanest level of profiling data collection, collects information about the elapsed time of the computational operators on the NPU and communication large operator information.
ProfilerLevel.Level1: Collect more CANN layer AscendCL data and AICore performance metrics and communication mini operator information based on Level0.
ProfilerLevel.Level2: Collect GE and Runtime information in CANN layer on top of Level1
aic_metrics (AicoreMetrics, optional) –
(Ascend only) Types of AICORE performance data collected, when using this parameter, activities must include
ProfilerActivity.NPU
, and the value must be a member of AicoreMetrics. When profiler_level is Level0, the default value isAicoreMetrics.AiCoreNone
; Profiler_level is a Level1 or Level2 stores, the default value is:AicoreMetrics. PipeUtilization
.The data items contained in each metric are as follows:AicoreMetrics.AiCoreNone: Does not collect AICORE data.
AicoreMetrics.ArithmeticUtilization: ArithmeticUtilization contains mac_fp16/int8_ratio, vec_fp32/fp16/int32_ratio, vec_misc_ratio etc.
AicoreMetrics.PipeUtilization: PipeUtilization contains vec_ratio, mac_ratio, scalar_ratio, mte1/mte2/mte3_ratio, icache_miss_rate etc.
AicoreMetrics.Memory: Memory contains ub_read/write_bw, l1_read/write_bw, l2_read/write_bw, main_mem_read/write_bw etc.
AicoreMetrics.MemoryL0: MemoryL0 contains l0a_read/write_bw, l0b_read/write_bw, l0c_read/write_bw etc.
AicoreMetrics.ResourceConflictRatio: ResourceConflictRatio contains vec_bankgroup/bank/resc_cflt_ratio etc.
AicoreMetrics.MemoryUB: MemoryUB contains ub_read/write_bw_mte, ub_read/write_bw_vector, ub_/write_bw_scalar etc.
AicoreMetrics.L2Cache: L2Cache contains write_cache_hit, write_cache_miss_allocate, r0_read_cache_hit, r1_read_cache_hit etc. This function only supports Atlas A2 training series products.
AicoreMetrics.MemoryAccess: Statistics on storage access bandwidth and storage capacity of main storage and l2 cache etc.
l2_cache (bool, optional) – (Ascend only) Whether to collect l2 cache data, collect when True. Default:
False
. The l2_cache.csv file is generated in the ASCEND_PROFILER_OUTPUT folder.In O2 mode, only wait and skip_first parameters in schedule configuration can be set to 0.mstx (bool, optional) – (Ascend only) Whether to collect light weight profiling data, collect when True. Default:
False
.data_simplification (bool, optional) – (Ascend only) Whether to remove FRAMEWORK data and other redundant data. If set to True, only the profiler deliverables and raw performance data under the PROF_XXX directory are kept to save space. Default value:
True
.export_type (list, optional) –
(Ascend only) The data type to export. The db and text formats can be exported at the same time. The default value is
None
, indicating that data of the text type is exported.ExportType.Text: Export text type data.
ExportType.Db: Export db type data.
- Raises
RuntimeError – When the version of CANN does not match the version of MindSpore, MindSpore cannot parse the generated ascend_job_id directory structure.
- Supported Platforms:
Ascend
GPU
Examples
>>> import numpy as np >>> import mindspore >>> from mindspore import nn, context >>> import mindspore.dataset as ds >>> from mindspore.profiler import ProfilerLevel, ProfilerActivity, AicoreMetrics, ExportType >>> >>> class Net(nn.Cell): ... def __init__(self): ... super(Net, self).__init__() ... self.fc = nn.Dense(2,2) ... def construct(self, x): ... return self.fc(x) >>> >>> def generator(): ... for i in range(2): ... yield np.ones([2, 2]).astype(np.float32), np.ones([2]).astype(np.int32) >>> >>> def train(net): ... optimizer = nn.Momentum(net.trainable_params(), 1, 0.9) ... loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True) ... data = ds.GeneratorDataset(generator, ["data", "label"]) ... model = mindspore.train.Model(net, loss, optimizer) ... model.train(1, data) >>> >>> if __name__ == '__main__': ... # If the device_target is GPU, set the device_target to "GPU" ... context.set_context(mode=mindspore.GRAPH_MODE) ... mindspore.set_device("Ascend") ... ... # Init Profiler ... experimental_config = mindspore.profiler._ExperimentalConfig( ... profiler_level=ProfilerLevel.Level0, ... aic_metrics=AicoreMetrics.AiCoreNone, ... l2_cache=False, ... mstx=False, ... data_simplification=False, ... export_type=[ExportType.Text]) ... steps = 10 ... net = Net() ... # Note that the Profiler should be initialized before model.train ... with mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.NPU], ... schedule=mindspore.profiler.schedule(wait=1, warmup=1, active=2, ... repeat=1, skip_first=2), ... on_trace_ready=mindspore.profiler.tensorboard_trace_handler("./data"), ... profile_memory=False, ... experimental_config=experimental_config) as prof: ... ... # Train Model ... for step in range(steps): ... train(net) ... prof.step()