mindspore.dataset.config

The configuration module provides various functions to set and get the supported configuration parameters, and read a configuration file.

mindspore.dataset.config.get_auto_num_workers()

Get the setting (turned on or off) automatic number of workers.

Returns: bool, whether auto num worker feature is turned on.

Examples

>>> num_workers = ds.config.get_auto_num_workers()

mindspore.dataset.config.get_callback_timeout()

Get the default timeout for DSWaitedCallback. In case of a deadlock, the wait function will exit after the timeout period.

Returns: int, the duration in seconds.

mindspore.dataset.config.get_monitor_sampling_interval()

Get the default interval of performance monitor sampling.

Returns: int, interval (in milliseconds) for performance monitor sampling.

mindspore.dataset.config.get_num_parallel_workers()

Get the default number of parallel workers. This is the DEFAULT num_parallel_workers value used for each op, it is not related to AutoNumWorker feature.

Returns: int, number of parallel workers to be used as a default for each operation.

mindspore.dataset.config.get_numa_enable()

Get the default state of numa enabled. This is the DEFAULT numa enabled value used for the all process.

Returns: bool, the default state of numa enabled.

mindspore.dataset.config.get_prefetch_size()

Get the prefetch size in number of rows.

Returns: int, total number of rows to be prefetched.

mindspore.dataset.config.get_seed()

Get the seed.

Returns: int, seed.

mindspore.dataset.config.load(file)

Load configurations from a file.

Parameters: file (str) – Path of the configuration file to be loaded.
Raises: RuntimeError – If file is invalid and parsing fails.

Examples

>>> # Set new default configuration according to values in the configuration file.
>>> # example config file:
>>> # {
>>> #     "logFilePath": "/tmp",
>>> #     "numParallelWorkers": 4,
>>> #     "seed": 5489,
>>> #     "monitorSamplingInterval": 30
>>> # }
>>> config_file = "/path/to/config/file"
>>> ds.config.load(config_file)

mindspore.dataset.config.set_auto_num_workers(enable)

Set num_parallel_workers for each op automatically. (This feature is turned off by default) If turned on, the num_parallel_workers in each op will be adjusted automatically, possibly overwriting the num_parallel_workers passed in by user or the default value (if user doesn’t pass anything) set by ds.config.set_num_parallel_workers(). For now, this function is only optimized for YoloV3 dataset with per_batch_map (running map in batch). This feature aims to provide a baseline for optimized num_workers assignment for each op. Op whose num_parallel_workers is adjusted to a new value will be logged.

Parameters: enable (bool) – Whether to enable auto num_workers feature or not.
Raises: TypeError – If enable is not of boolean type.

Examples

>>> # Enable auto_num_worker feature, this might override the num_parallel_workers passed in by user
>>> ds.config.set_auto_num_workers(True)

mindspore.dataset.config.set_monitor_sampling_interval(interval)

Set the default interval (in milliseconds) for monitor sampling.

Parameters: interval (int) – Interval (in milliseconds) to be used for performance monitor sampling.
Raises: ValueError – If interval is invalid (<= 0 or > MAX_INT_32).

Examples

>>> # Set a new global configuration value for the monitor sampling interval.
>>> ds.config.set_monitor_sampling_interval(100)

mindspore.dataset.config.set_num_parallel_workers(num)

Set the default number of parallel workers.

Parameters: num (int) – Number of parallel workers to be used as a default for each operation.
Raises: ValueError – If num_parallel_workers is invalid (<= 0 or > MAX_INT_32).

Examples

>>> # Set a new global configuration value for the number of parallel workers.
>>> # Now parallel dataset operators will run with 8 workers.
>>> ds.config.set_num_parallel_workers(8)

mindspore.dataset.config.set_numa_enable(numa_enable)

Set the default state of numa enabled. If numa_enable is True, need to ensure numa library is installed.

Parameters: numa_enable (bool) – Whether to use numa bind feature.
Raises: TypeError – If numa_enable is not a boolean data type.

Examples

>>> # Set a new global configuration value for the state of numa enabled.
>>> # Now parallel dataset operators will run with numa bind function
>>> ds.config.set_numa_enable(True)

mindspore.dataset.config.set_prefetch_size(size)

Set the number of rows to be prefetched.

Parameters: size (int) – Total number of rows to be prefetched per operator per parallel worker.
Raises: ValueError – If prefetch_size is invalid (<= 0 or > MAX_INT_32).

Note

Since total memory used for prefetch can grow very large with high number of workers, when number of workers is > 4, the per worker prefetch size will be reduced. The actual prefetch size at runtime per worker will be prefetchsize * (4 / num_parallel_workers).

Examples

>>> # Set a new global configuration value for the prefetch size.
>>> ds.config.set_prefetch_size(1000)

mindspore.dataset.config.set_seed(seed)

Set the seed to be used in any random generator. This is used to produce deterministic results.

Note

This set_seed function sets the seed in the Python random library and numpy.random library for deterministic Python augmentations using randomness. This set_seed function should be called with every iterator created to reset the random seed. In the pipeline, this does not guarantee deterministic results with num_parallel_workers > 1.

Parameters: seed (int) – Seed to be set.
Raises: ValueError – If seed is invalid (< 0 or > MAX_UINT_32).

Examples

>>> # Set a new global configuration value for the seed value.
>>> # Operations with randomness will use the seed value to generate random values.
>>> ds.config.set_seed(1000)