mindspore.dataset.config

The configuration module provides various functions to set and get the supported configuration parameters, and read a configuration file.

Common imported modules in corresponding API examples are as follows:

import mindspore.dataset as ds
mindspore.dataset.config.get_auto_num_workers()

Get the setting (turned on or off) automatic number of workers.

Returns

bool, whether auto number worker feature is turned on.

Examples

>>> # Get the global configuration of auto number worker feature.
>>> num_workers = ds.config.get_auto_num_workers()
mindspore.dataset.config.get_autotune_interval()

Get the global configuration of pipeline autotuning step interval.

Returns

int, interval (in steps) for data pipeline autotuning.

Examples

>>> # Get the global configuration of the autotuning interval.
>>> # If set_autotune_interval() is never called before, the default value(30) will be returned.
>>> autotune_interval = ds.config.get_autotune_interval()
mindspore.dataset.config.get_callback_timeout()

Get the default timeout for DSWaitedCallback. In case of a deadlock, the wait function will exit after the timeout period.

Returns

int, Timeout (in seconds) to be used to end the wait in DSWaitedCallback in case of a deadlock.

Examples

>>> # Get the global configuration of callback timeout.
>>> # If set_callback_timeout() is never called before, the default value(60) will be returned.
>>> callback_timeout = ds.config.get_callback_timeout()
mindspore.dataset.config.get_enable_autotune()

Get the default state of AutoTune enabled variable.

Returns

bool, the state of AutoTune enabled variable (default=True).

Examples

>>> # Get the flag of AutoTune feature.
>>> autotune_flag = ds.config.get_enable_autotune()
mindspore.dataset.config.get_enable_shared_mem()

Get the default state of shared mem enabled variable.

Note

get_enable_shared_mem is not supported on Windows and MacOS platforms yet.

Returns

bool, the state of shared mem enabled variable (default=True).

Examples

>>> # Get the flag of shared memory feature.
>>> shared_mem_flag = ds.config.get_enable_shared_mem()
mindspore.dataset.config.get_monitor_sampling_interval()

Get the global configuration of sampling interval of performance monitor.

Returns

int, interval (in milliseconds) for performance monitor sampling.

Examples

>>> # Get the global configuration of monitor sampling interval.
>>> # If set_monitor_sampling_interval() is never called before, the default value(1000) will be returned.
>>> sampling_interval = ds.config.get_monitor_sampling_interval()
mindspore.dataset.config.get_num_parallel_workers()

Get the global configuration of number of parallel workers. This is the DEFAULT num_parallel_workers value used for each operation, it is not related to AutoNumWorker feature.

Returns

int, number of parallel workers to be used as a default for each operation.

Examples

>>> # Get the global configuration of parallel workers.
>>> # If set_num_parallel_workers() is never called before, the default value(8) will be returned.
>>> num_parallel_workers = ds.config.get_num_parallel_workers()
mindspore.dataset.config.get_numa_enable()

Get the state of numa to indicate enabled/disabled. This is the DEFAULT numa enabled value used for the all process.

Returns

bool, the default state of numa enabled.

Examples

>>> # Get the global configuration of numa.
>>> numa_state = ds.config.get_numa_enable()
mindspore.dataset.config.get_prefetch_size()

Get the prefetch size as for number of rows.

Returns

int, total number of rows to be prefetched.

Examples

>>> # Get the global configuration of prefetch size.
>>> # If set_prefetch_size() is never called before, the default value(16) will be returned.
>>> prefetch_size = ds.config.get_prefetch_size()
mindspore.dataset.config.get_seed()

Get random number seed. If the seed has been set, then will return the set value, otherwise it will return the default seed value which equals to std::mt19937::default_seed.

Returns

int, random number seed.

Examples

>>> # Get the global configuration of seed.
>>> # If set_seed() is never called before, the default value(std::mt19937::default_seed) will be returned.
>>> seed = ds.config.get_seed()
mindspore.dataset.config.load(file)

Load the project configuration from the file format.

Parameters

file (str) – Path of the configuration file to be loaded.

Raises

RuntimeError – If file is invalid and parsing fails.

Examples

>>> # Set new default configuration according to values in the configuration file.
>>> # example config file:
>>> # {
>>> #     "logFilePath": "/tmp",
>>> #     "numParallelWorkers": 4,
>>> #     "seed": 5489,
>>> #     "monitorSamplingInterval": 30
>>> # }
>>> config_file = "/path/to/config/file"
>>> ds.config.load(config_file)
mindspore.dataset.config.set_auto_num_workers(enable)

Set num_parallel_workers for each op automatically(This feature is turned off by default).

If turned on, the num_parallel_workers in each op will be adjusted automatically, possibly overwriting the num_parallel_workers passed in by user or the default value (if user doesn’t pass anything) set by ds.config.set_num_parallel_workers().

For now, this function is only optimized for YoloV3 dataset with per_batch_map (running map in batch). This feature aims to provide a baseline for optimized num_workers assignment for each operation. Operation whose num_parallel_workers is adjusted to a new value will be logged.

Parameters

enable (bool) – Whether to enable auto num_workers feature or not.

Raises

TypeError – If enable is not of boolean type.

Examples

>>> # Enable auto_num_worker feature, this might override the num_parallel_workers passed in by user
>>> ds.config.set_auto_num_workers(True)
mindspore.dataset.config.set_autotune_interval(interval)

Set the interval (in steps) for data pipeline autotuning. Setting interval to 0 configures autotune to run after every epoch instead of after a certain number of steps. Default value is set to 0, meaning epoch based autotuning.

Parameters

interval (int) – Interval (in steps) to serve as gap for consecutive AutoTune runs.

Raises

ValueError – If interval is invalid when interval < 0 or interval > MAX_INT_32.

Examples

>>> # Set a new global configuration value for the autotuning interval.
>>> ds.config.set_autotune_interval(30)
mindspore.dataset.config.set_callback_timeout(timeout)

Set the default timeout (in seconds) for DSWaitedCallback. In case of a deadlock, the wait function will exit after the timeout period.

Parameters

timeout (int) – Timeout (in seconds) to be used to end the wait in DSWaitedCallback in case of a deadlock.

Raises

ValueError – If timeout is invalid when timeout <= 0 or timeout > MAX_INT_32.

Examples

>>> # Set a new global configuration value for the timeout value.
>>> ds.config.set_callback_timeout(100)
mindspore.dataset.config.set_enable_autotune(enable)

Set the default state of AutoTune flag. If it is True, will facilitate users to improve performance for a given workload by automatically finding the better settings for data pipeline.

Parameters

enable (bool) – Whether to use AutoTune feature when running data pipeline.

Raises

TypeError – If enable is not a boolean data type.

Examples

>>> # Enable AutoTune
>>> ds.config.set_enable_autotune(True)
mindspore.dataset.config.set_enable_shared_mem(enable)

Set the default state of shared memory flag. If shared_mem_enable is True, will use shared memory queues to pass data to processes that are created for operators that set python_multiprocessing=True.

Note

set_enable_shared_mem is not supported on Windows and MacOS platforms yet.

Parameters

enable (bool) – Whether to use shared memory in operators when python_multiprocessing=True.

Raises

TypeError – If enable is not a boolean data type.

Examples

>>> # Enable shared memory feature to improve the performance of Python multiprocessing.
>>> ds.config.set_enable_shared_mem(True)
mindspore.dataset.config.set_monitor_sampling_interval(interval)

Set the default interval (in milliseconds) for monitor sampling.

Parameters

interval (int) – Interval (in milliseconds) to be used for performance monitor sampling.

Raises

ValueError – If interval is invalid when interval <= 0 or interval > MAX_INT_32.

Examples

>>> # Set a new global configuration value for the monitor sampling interval.
>>> ds.config.set_monitor_sampling_interval(100)
mindspore.dataset.config.set_num_parallel_workers(num)

Set a new global configuration default value for the number of parallel workers. This setting will affect the parallelism of all dataset operation.

Parameters

num (int) – Number of parallel workers to be used as a default for each operation.

Raises

ValueError – If num_parallel_workers is invalid when num <= 0 or num > MAX_INT_32.

Examples

>>> # Set a new global configuration value for the number of parallel workers.
>>> # Now parallel dataset operators will run with 8 workers.
>>> ds.config.set_num_parallel_workers(8)
mindspore.dataset.config.set_numa_enable(numa_enable)

Set the default state of numa enabled. If numa_enable is True, need to ensure numa library is installed.

Parameters

numa_enable (bool) – Whether to use numa bind feature.

Raises

TypeError – If numa_enable is not a boolean data type.

Examples

>>> # Set a new global configuration value for the state of numa enabled.
>>> # Now parallel dataset operators will run with numa bind function
>>> ds.config.set_numa_enable(True)
mindspore.dataset.config.set_prefetch_size(size)

Set the queue capacity of the thread in pipeline.

Parameters

size (int) – The length of the cache queue.

Raises

ValueError – If the queue capacity of the thread is invalid when size <= 0 or size > MAX_INT_32.

Note

Since total memory used for prefetch can grow very large with high number of workers, when the number of workers is greater than 4, the per worker prefetch size will be reduced. The actual prefetch size at runtime per-worker will be prefetchsize * (4 / num_parallel_workers).

Examples

>>> # Set a new global configuration value for the prefetch size.
>>> ds.config.set_prefetch_size(1000)
mindspore.dataset.config.set_seed(seed)

If the seed is set, the generated random number will be fixed, this helps to produce deterministic results.

Note

This set_seed function sets the seed in the Python random library and numpy.random library for deterministic Python augmentations using randomness. This set_seed function should be called with every iterator created to reset the random seed. In the pipeline, this does not guarantee deterministic results with num_parallel_workers > 1.

Parameters

seed (int) – Random number seed. It is used to generate deterministic random numbers.

Raises

ValueError – If seed is invalid when seed < 0 or seed > MAX_UINT_32.

Examples

>>> # Set a new global configuration value for the seed value.
>>> # Operations with randomness will use the seed value to generate random values.
>>> ds.config.set_seed(1000)
mindspore.dataset.config.set_sending_batches(batch_num)

Set the default sending batches when training with sink_mode=True in Ascend device.

Parameters

batch_num (int) – the total sending batches, when batch_num is set, it will wait unless sending batches increase, default is 0 which means will send all batches in dataset.

Raises

TypeError – If batch_num is not in int type.

Examples

>>> # Set a new global configuration value for the sending batches
>>> ds.config.set_sending_batches(10)