mindspore.profiler.profile

查看源文件
class mindspore.profiler.profile(activities: list = None, with_stack: bool = False, profile_memory: bool = False, data_process: bool = False, parallel_strategy: bool = False, start_profile: bool = True, hbm_ddr: bool = False, pcie: bool = False, sync_enable: bool = True, schedule: Schedule = None, on_trace_ready: Optional[Callable[..., Any]] = None, experimental_config: Optional[_ExperimentalConfig] = None)

MindSpore用户能够通过该类对神经网络的性能进行采集。可以通过导入该类初始化profile对象, 使用 profile.start() 开始分析,并使用 profile.stop() 停止收集并分析结果。可通过 MindStudio Insight 工具可视化分析结果。目前,profile支持AICORE算子、AICPU算子、HostCPU算子、内存、设备通信、集群等数据的分析。

参数:
  • start_profile (bool, 可选) - 该参数控制是否在Profiler初始化的时候开启数据采集。默认值: True

  • activities (list, 可选) - 表示需要收集的性能数据类型。默认值: [ProfilerActivity.CPU, ProfilerActivity.NPU]

    • ProfilerActivity.CPU:收集MindSpore框架数据。

    • ProfilerActivity.NPU:收集CANN软件栈和NPU数据。

    • ProfilerActivity.GPU:收集GPU数据。

  • schedule (schedule, 可选) - 设置采集的动作策略,由schedule类定义,需要配合step接口使用,默认值: None

  • on_trace_ready (Callable, 可选) - 设置当性能数据采集完成时,执行的回调函数。默认值: None

  • profile_memory (bool, 可选) -(仅限Ascend)表示是否收集Tensor内存数据。当值为 True 时,收集这些数据。使用该参数时, activities 必须设置为 [ProfilerActivity.CPU, ProfilerActivity.NPU] 。在图编译等级为O2时收集算子内存数据,需要从第一个step开始采集。默认值: False ,该参数目前采集的算子名称不完整。将在后续版本修复,建议使用环境变量 MS_ALLOC_CONF 代替。

  • with_stack (bool, 可选) - (仅限Ascend)表示是否收集Python侧的调用栈的数据,此数据在timeline中采用火焰图的形式呈现,使用该参数时, activities 必须包含 ProfilerActivity.CPU 。默认值: False

  • hbm_ddr (bool, 可选) -(仅限Ascend)是否收集片上内存/DDR内存读写速率数据,当值为 True 时,收集这些数据。默认值: False

  • pcie (bool, 可选) -(仅限Ascend)是否收集PCIe带宽数据,当值为 True 时,收集这些数据。默认值: False

  • data_process (bool, 可选) -(Ascend/GPU)表示是否收集数据准备性能数据,默认值: False

  • parallel_strategy (bool, 可选) -(仅限Ascend)表示是否收集并行策略性能数据,默认值: False

  • sync_enable (bool, 可选) -(仅限GPU)Profiler是否用同步的方式收集算子耗时,默认值: True

    • True:同步方式,在把算子发送到GPU之前,在CPU端记录开始时间戳。然后在算子执行完毕返回到CPU端后,再记录结束时间戳。算子耗时为两个时间戳的差值。

    • False:异步方式,算子耗时为从CPU发送到GPU的耗时。这种方式能减少因增加Profiler对整体训练时间的影响。

  • experimental_config (_ExperimentalConfig, 可选) - 可扩展的参数可以在此配置。

异常:
  • RuntimeError - 当CANN的版本与MindSpore版本不匹配时,MindSpore无法解析生成的ascend_job_id目录结构。

支持平台:

Ascend GPU

样例:

>>> import numpy as np
>>> import mindspore
>>> from mindspore import nn, context
>>> import mindspore.dataset as ds
>>> from mindspore.profiler import ProfilerLevel, ProfilerActivity, AicoreMetrics, ExportType
>>>
>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.fc = nn.Dense(2,2)
...     def construct(self, x):
...         return self.fc(x)
>>>
>>> def generator():
...     for i in range(2):
...         yield np.ones([2, 2]).astype(np.float32), np.ones([2]).astype(np.int32)
>>>
>>> def train(net):
...     optimizer = nn.Momentum(net.trainable_params(), 1, 0.9)
...     loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True)
...     data = ds.GeneratorDataset(generator, ["data", "label"])
...     model = mindspore.train.Model(net, loss, optimizer)
...     model.train(1, data)
>>>
>>> if __name__ == '__main__':
...     # If the device_target is GPU, set the device_target to "GPU"
...     context.set_context(mode=mindspore.GRAPH_MODE)
...     mindspore.set_device("Ascend")
...
...     # Init Profiler
...     experimental_config = mindspore.profiler._ExperimentalConfig(
...                                 profiler_level=ProfilerLevel.Level0,
...                                 aic_metrics=AicoreMetrics.AiCoreNone,
...                                 l2_cache=False,
...                                 mstx=False,
...                                 data_simplification=False,
...                                 export_type=[ExportType.Text])
...     steps = 10
...     net = Net()
...     # Note that the Profiler should be initialized before model.train
...     with mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.NPU],
...                                     schedule=mindspore.profiler.schedule(wait=1, warmup=1, active=2,
...                                           repeat=1, skip_first=2),
...                                     on_trace_ready=mindspore.profiler.
...                                           tensorboard_trace_handler("./data"),
...                                     profile_memory=False,
...                                     experimental_config=experimental_config) as prof:
...
...         # Train Model
...         for step in range(steps):
...             train(net)
...             prof.step()
add_metadata(key: str, value: str)

上报自定义metadata键值对数据。

参数:
  • key (str) - metadata键值对的key。

  • value (str) - metadata键值对的value。

样例:

>>> import mindspore
>>> # Profiler init.
>>> with mindspore.profiler.profile() as prof:
>>>     # Call Profiler add_metadata
>>>     prof.add_metadata("test_key", "test_value")
add_metadata_json(key: str, value: str)

上报自定义metadata键值对value为json字符串数据。

参数:
  • key (str) - metadata键值对的key。

  • value (str) - metadata键值对的value,格式为json字符串。

样例:

>>> import json
>>> import mindspore
>>> # Profiler init.
>>> with mindspore.profiler.profile() as prof:
>>>     # Call Profiler add_metadata_json
>>>     prof.add_metadata_json("test_key", json.dumps({"key1": 1, "key2": 2}))
start()

开启profile数据采集。可以按条件开启profile。

异常:
  • RuntimeError - profile已经开启。

  • RuntimeError - 如果 start_profile 参数未设置或设置为 True

样例:

>>> import numpy as np
>>> import mindspore
>>> from mindspore import nn, context
>>> import mindspore.dataset as ds
>>> from mindspore.profiler import ProfilerLevel, ProfilerActivity, AicoreMetrics, ExportType
>>>
>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.fc = nn.Dense(2,2)
...     def construct(self, x):
...         return self.fc(x)
>>>
>>> def generator():
...     for i in range(2):
...         yield np.ones([2, 2]).astype(np.float32), np.ones([2]).astype(np.int32)
>>>
>>> def train(net):
...     optimizer = nn.Momentum(net.trainable_params(), 1, 0.9)
...     loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True)
...     data = ds.GeneratorDataset(generator, ["data", "label"])
...     model = mindspore.train.Model(net, loss, optimizer)
...     model.train(1, data)
>>>
>>> if __name__ == '__main__':
...     # If the device_target is GPU, set the device_target to "GPU"
...     context.set_context(mode=mindspore.GRAPH_MODE)
...     mindspore.set_device("Ascend")
...
...     # Init Profiler
...     experimental_config = mindspore.profiler._ExperimentalConfig(
...                                 profiler_level=ProfilerLevel.Level0,
...                                 aic_metrics=AicoreMetrics.AiCoreNone,
...                                 l2_cache=False,
...                                 mstx=False,
...                                 data_simplification=False,
...                                 export_type=[ExportType.Text])
...     steps = 10
...     net = Net()
...     # Note that the Profiler should be initialized before model.train
...     prof = mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.NPU],
...                                       schedule=mindspore.profiler.schedule(wait=1, warmup=1, active=2,
...                                           repeat=1, skip_first=2),
...                                       on_trace_ready=mindspore.profiler.
...                                           tensorboard_trace_handler("./data"),
...                                       profile_memory=False,
...                                       experimental_config=experimental_config)
...     prof.start()
...     # Train Model
...     for step in range(steps):
...         train(net)
...         prof.step()
...     prof.stop()
step()

用于在Ascend设备上,通过schedule和on_trace_ready区分步骤收集和解析性能数据。

异常:
  • RuntimeError - 如果 start_profile 参数未设置或profile未开启。

  • RuntimeError - 如果 schedule 参数未设置。

样例:

>>> import numpy as np
>>> import mindspore
>>> from mindspore import nn, context
>>> import mindspore.dataset as ds
>>> from mindspore.profiler import ProfilerLevel, ProfilerActivity, AicoreMetrics, ExportType
>>>
>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.fc = nn.Dense(2,2)
...     def construct(self, x):
...         return self.fc(x)
>>>
>>> def generator():
...     for i in range(2):
...         yield np.ones([2, 2]).astype(np.float32), np.ones([2]).astype(np.int32)
>>>
>>> def train(net):
...     optimizer = nn.Momentum(net.trainable_params(), 1, 0.9)
...     loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True)
...     data = ds.GeneratorDataset(generator, ["data", "label"])
...     model = mindspore.train.Model(net, loss, optimizer)
...     model.train(1, data)
>>>
>>> if __name__ == '__main__':
...     # If the device_target is GPU, set the device_target to "GPU"
...     context.set_context(mode=mindspore.GRAPH_MODE)
...     mindspore.set_device("Ascend")
...
...     # Init Profiler
...     experimental_config = mindspore.profiler._ExperimentalConfig(
...                                 profiler_level=ProfilerLevel.Level0,
...                                 aic_metrics=AicoreMetrics.AiCoreNone,
...                                 l2_cache=False,
...                                 mstx=False,
...                                 data_simplification=False,
...                                 export_type=[ExportType.Text])
...     steps = 10
...     net = Net()
...     # Note that the Profiler should be initialized before model.train
...     with mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.NPU],
...                                     schedule=mindspore.profiler.schedule(wait=1, warmup=1, active=2,
...                                           repeat=1, skip_first=2),
...                                     on_trace_ready=mindspore.profiler.tensorboard_trace_handler("./data"),
...                                     profile_memory=False,
...                                     experimental_config=experimental_config) as prof:
...
...         # Train Model
...         for step in range(steps):
...             train(net)
...             prof.step()
stop()

停止profile。可以按条件停止profile。

异常:
  • RuntimeError - profile没有开启。

样例:

>>> import numpy as np
>>> import mindspore
>>> from mindspore import nn, context
>>> import mindspore.dataset as ds
>>> from mindspore.profiler import ProfilerLevel, ProfilerActivity, AicoreMetrics, ExportType
>>>
>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.fc = nn.Dense(2,2)
...     def construct(self, x):
...         return self.fc(x)
>>>
>>> def generator():
...     for i in range(2):
...         yield np.ones([2, 2]).astype(np.float32), np.ones([2]).astype(np.int32)
>>>
>>> def train(net):
...     optimizer = nn.Momentum(net.trainable_params(), 1, 0.9)
...     loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True)
...     data = ds.GeneratorDataset(generator, ["data", "label"])
...     model = mindspore.train.Model(net, loss, optimizer)
...     model.train(1, data)
>>>
>>> if __name__ == '__main__':
...     # If the device_target is GPU, set the device_target to "GPU"
...     context.set_context(mode=mindspore.GRAPH_MODE)
...     mindspore.set_device("Ascend")
...
...     # Init Profiler
...     experimental_config = mindspore.profiler._ExperimentalConfig(
...                                 profiler_level=ProfilerLevel.Level0,
...                                 aic_metrics=AicoreMetrics.AiCoreNone,
...                                 l2_cache=False,
...                                 mstx=False,
...                                 data_simplification=False,
...                                 export_type=[ExportType.Text])
...     steps = 10
...     net = Net()
...     # Note that the Profiler should be initialized before model.train
...     prof = mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.NPU],
...                                       schedule=mindspore.profiler.schedule(wait=1, warmup=1, active=2,
...                                           repeat=1, skip_first=2),
...                                       on_trace_ready=mindspore.profiler.
...                                           tensorboard_trace_handler("./data"),
...                                       profile_memory=False,
...                                       experimental_config=experimental_config)
...     prof.start()
...     # Train Model
...     for step in range(steps):
...         train(net)
...         prof.step()
...     prof.stop()