mindspore.dataset.config.set_enable_autotune

mindspore.dataset.config.set_enable_autotune(enable, filepath_prefix=None)

Set whether to enable AutoTune for data pipeline parameters.

It can be used to automatically adjust the parameter configurations of each operation in the data processing pipeline, such as parallelism and buffer queue size, according to the load of the environment resources during training, so as to improve the overall processing speed.

AutoTune is not enabled by default.

Parameters
  • enable (bool) – Whether to enable AutoTune.

  • filepath_prefix (str, optional) – The path where the optimized parameter configuration will be saved. Effective only if enable is True. The parameter configuration file on each Device will be saved separately, and the final file name will be filepath_prefix + RANK_ID + “.json”, where RANK_ID is the Device ID corresponding to the file. Default: None , no configuration file is saved.

Raises
  • TypeError – If enable is not of type boolean.

  • TypeError – If filepath_prefix is not of type str.

  • RuntimeError – If filepath_prefix is an empty string.

  • RuntimeError – If filepath_prefix is a directory.

  • RuntimeError – If filepath_prefix does not exist.

  • RuntimeError – If filepath_prefix does not have write permission.

Note

  • Saved parameter profiles can be loaded via the mindspore.dataset.deserialize interface to directly obtain a data processing pipeline object configured with optimal parameters.

  • The parameter tuning process can be viewed by turning on INFO level logging.

An example of the generated configuration file is as follows, the “remark” field describes whether or not data processing parameter tuning has been performed, the “summary” field briefly shows each operation in the data processing pipeline and its corresponding optimal configuration, and the “tree” field provides complete information about the structure of the data processing pipeline.

{
    "remark": "The following file has been auto-generated by the Dataset AutoTune.",
    "summary": [
        "CifarOp(ID:5)       (num_parallel_workers: 2, prefetch_size:64)",
        "MapOp(ID:4)         (num_parallel_workers: 2, prefetch_size:64)",
        "MapOp(ID:3)         (num_parallel_workers: 2, prefetch_size:64)",
        "BatchOp(ID:2)       (num_parallel_workers: 8, prefetch_size:64)"
    ],
    "tree": {
        ...
    }
}

Examples

>>> import mindspore.dataset as ds
>>>
>>> # enable AutoTune and save optimized data pipeline configuration
>>> ds.config.set_enable_autotune(True, "/path/to/autotune_out.json")
>>>
>>> # enable AutoTune
>>> ds.config.set_enable_autotune(True)