mindspore.dataset.config.set_enable_autotune

mindspore.dataset.config.set_enable_autotune(enable, filepath_prefix=None)

设置是否开启数据处理参数自动调优。

可用于在训练中根据环境资源的负载，自动调整数据处理流水线中各个操作的参数配置，如并行度、缓冲队列大小，从而提高整体处理速度。

该功能默认不开启。

参数：

enable (bool) - 是否开启自动调优。
filepath_prefix (str，可选) - 优化后的参数配置的保存路径。仅当 enable 为 True 时生效。各个Device上的参数配置文件将分别保存，最终保存的文件名将为 filepath_prefix + RANK_ID + ".json" ，其中 RANK_ID 为该文件对应的Device编号。默认值： None ，不保存配置文件。

异常：

TypeError - 当 enable 的类型不为bool。
TypeError - 当 filepath_prefix 的类型不为str。
RuntimeError - 当 filepath_prefix 字符串长度为0。
RuntimeError - 当 filepath_prefix 为目录。
RuntimeError - 当 filepath_prefix 路径不存在。
RuntimeError - 当 filepath_prefix 没有写入权限。

说明

保存的参数配置文件可通过 mindspore.dataset.deserialize 接口加载，直接得到配置好最优参数的数据处理流水线对象。
可通过开启INFO级别日志，查看参数调优过程。

生成的配置文件内容示例如下，"remark"字段描述是否进行了数据处理参数调优，"summary"字段简要展示了数据处理流水线中各个操作及其对应的最优配置，而"tree"字段则为完整的数据处理流水线结构信息。

{
    "remark": "The following file has been auto-generated by the Dataset AutoTune.",
    "summary": [
        "CifarOp(ID:5)       (num_parallel_workers: 2, prefetch_size:64)",
        "MapOp(ID:4)         (num_parallel_workers: 2, prefetch_size:64)",
        "MapOp(ID:3)         (num_parallel_workers: 2, prefetch_size:64)",
        "BatchOp(ID:2)       (num_parallel_workers: 8, prefetch_size:64)"
    ],
    "tree": {
        ...
    }
}

样例：

>>> import mindspore.dataset as ds
>>>
>>> # enable AutoTune and save optimized data pipeline configuration
>>> ds.config.set_enable_autotune(True, "/path/to/autotune_out.json")
>>>
>>> # enable AutoTune
>>> ds.config.set_enable_autotune(True)