mindspore.dataset.config
The configuration module provides various functions to set and get the supported configuration parameters, and read a configuration file.
Common imported modules in corresponding API examples are as follows:
import mindspore.dataset as ds
- mindspore.dataset.config.get_auto_num_workers()
Get the setting (turned on or off) automatic number of workers.
- Returns
bool, whether auto number worker feature is turned on.
Examples
>>> # Get the global configuration of auto number worker feature. >>> num_workers = ds.config.get_auto_num_workers()
- mindspore.dataset.config.get_autotune_interval()
Get the global configuration of pipeline autotuning step interval.
- Returns
int, interval (in steps) for data pipeline autotuning.
Examples
>>> # Get the global configuration of the autotuning interval. >>> # If set_autotune_interval() is never called before, the default value(30) will be returned. >>> autotune_interval = ds.config.get_autotune_interval()
- mindspore.dataset.config.get_callback_timeout()
Get the default timeout for DSWaitedCallback. In case of a deadlock, the wait function will exit after the timeout period.
- Returns
int, Timeout (in seconds) to be used to end the wait in DSWaitedCallback in case of a deadlock.
Examples
>>> # Get the global configuration of callback timeout. >>> # If set_callback_timeout() is never called before, the default value(60) will be returned. >>> callback_timeout = ds.config.get_callback_timeout()
- mindspore.dataset.config.get_enable_autotune()
Get the default state of AutoTune enabled variable.
- Returns
bool, the state of AutoTune enabled variable (default=True).
Examples
>>> # Get the flag of AutoTune feature. >>> autotune_flag = ds.config.get_enable_autotune()
Get the default state of shared mem enabled variable.
Note
get_enable_shared_mem is not supported on Windows and MacOS platforms yet.
- Returns
bool, the state of shared mem enabled variable (default=True).
Examples
>>> # Get the flag of shared memory feature. >>> shared_mem_flag = ds.config.get_enable_shared_mem()
- mindspore.dataset.config.get_monitor_sampling_interval()
Get the global configuration of sampling interval of performance monitor.
- Returns
int, interval (in milliseconds) for performance monitor sampling.
Examples
>>> # Get the global configuration of monitor sampling interval. >>> # If set_monitor_sampling_interval() is never called before, the default value(1000) will be returned. >>> sampling_interval = ds.config.get_monitor_sampling_interval()
- mindspore.dataset.config.get_num_parallel_workers()
Get the global configuration of number of parallel workers. This is the DEFAULT num_parallel_workers value used for each operation, it is not related to AutoNumWorker feature.
- Returns
int, number of parallel workers to be used as a default for each operation.
Examples
>>> # Get the global configuration of parallel workers. >>> # If set_num_parallel_workers() is never called before, the default value(8) will be returned. >>> num_parallel_workers = ds.config.get_num_parallel_workers()
- mindspore.dataset.config.get_numa_enable()
Get the state of numa to indicate enabled/disabled. This is the DEFAULT numa enabled value used for the all process.
- Returns
bool, the default state of numa enabled.
Examples
>>> # Get the global configuration of numa. >>> numa_state = ds.config.get_numa_enable()
- mindspore.dataset.config.get_prefetch_size()
Get the prefetch size as for number of rows.
- Returns
int, total number of rows to be prefetched.
Examples
>>> # Get the global configuration of prefetch size. >>> # If set_prefetch_size() is never called before, the default value(16) will be returned. >>> prefetch_size = ds.config.get_prefetch_size()
- mindspore.dataset.config.get_seed()
Get random number seed. If the seed has been set, then will return the set value, otherwise it will return the default seed value which equals to std::mt19937::default_seed.
- Returns
int, random number seed.
Examples
>>> # Get the global configuration of seed. >>> # If set_seed() is never called before, the default value(std::mt19937::default_seed) will be returned. >>> seed = ds.config.get_seed()
- mindspore.dataset.config.load(file)
Load the project configuration from the file format.
- Parameters
file (str) – Path of the configuration file to be loaded.
- Raises
RuntimeError – If file is invalid and parsing fails.
Examples
>>> # Set new default configuration according to values in the configuration file. >>> # example config file: >>> # { >>> # "logFilePath": "/tmp", >>> # "numParallelWorkers": 4, >>> # "seed": 5489, >>> # "monitorSamplingInterval": 30 >>> # } >>> config_file = "/path/to/config/file" >>> ds.config.load(config_file)
- mindspore.dataset.config.set_auto_num_workers(enable)
Set num_parallel_workers for each op automatically(This feature is turned off by default).
If turned on, the num_parallel_workers in each op will be adjusted automatically, possibly overwriting the num_parallel_workers passed in by user or the default value (if user doesn’t pass anything) set by ds.config.set_num_parallel_workers().
For now, this function is only optimized for YoloV3 dataset with per_batch_map (running map in batch). This feature aims to provide a baseline for optimized num_workers assignment for each operation. Operation whose num_parallel_workers is adjusted to a new value will be logged.
- Parameters
enable (bool) – Whether to enable auto num_workers feature or not.
- Raises
TypeError – If enable is not of boolean type.
Examples
>>> # Enable auto_num_worker feature, this might override the num_parallel_workers passed in by user >>> ds.config.set_auto_num_workers(True)
- mindspore.dataset.config.set_autotune_interval(interval)
Set the interval (in steps) for data pipeline autotuning. Setting interval to 0 configures autotune to run after every epoch instead of after a certain number of steps. Default value is set to 0, meaning epoch based autotuning.
- Parameters
interval (int) – Interval (in steps) to serve as gap for consecutive AutoTune runs.
- Raises
ValueError – If interval is invalid when interval < 0 or interval > MAX_INT_32.
Examples
>>> # Set a new global configuration value for the autotuning interval. >>> ds.config.set_autotune_interval(30)
- mindspore.dataset.config.set_callback_timeout(timeout)
Set the default timeout (in seconds) for DSWaitedCallback. In case of a deadlock, the wait function will exit after the timeout period.
- Parameters
timeout (int) – Timeout (in seconds) to be used to end the wait in DSWaitedCallback in case of a deadlock.
- Raises
ValueError – If timeout is invalid when timeout <= 0 or timeout > MAX_INT_32.
Examples
>>> # Set a new global configuration value for the timeout value. >>> ds.config.set_callback_timeout(100)
- mindspore.dataset.config.set_enable_autotune(enable)
Set the default state of AutoTune flag. If it is True, will facilitate users to improve performance for a given workload by automatically finding the better settings for data pipeline.
- Parameters
enable (bool) – Whether to use AutoTune feature when running data pipeline.
- Raises
TypeError – If enable is not a boolean data type.
Examples
>>> # Enable AutoTune >>> ds.config.set_enable_autotune(True)
Set the default state of shared memory flag. If shared_mem_enable is True, will use shared memory queues to pass data to processes that are created for operators that set python_multiprocessing=True.
Note
set_enable_shared_mem is not supported on Windows and MacOS platforms yet.
- Parameters
enable (bool) – Whether to use shared memory in operators when python_multiprocessing=True.
- Raises
TypeError – If enable is not a boolean data type.
Examples
>>> # Enable shared memory feature to improve the performance of Python multiprocessing. >>> ds.config.set_enable_shared_mem(True)
- mindspore.dataset.config.set_monitor_sampling_interval(interval)
Set the default interval (in milliseconds) for monitor sampling.
- Parameters
interval (int) – Interval (in milliseconds) to be used for performance monitor sampling.
- Raises
ValueError – If interval is invalid when interval <= 0 or interval > MAX_INT_32.
Examples
>>> # Set a new global configuration value for the monitor sampling interval. >>> ds.config.set_monitor_sampling_interval(100)
- mindspore.dataset.config.set_num_parallel_workers(num)
Set a new global configuration default value for the number of parallel workers. This setting will affect the parallelism of all dataset operation.
- Parameters
num (int) – Number of parallel workers to be used as a default for each operation.
- Raises
ValueError – If num_parallel_workers is invalid when num <= 0 or num > MAX_INT_32.
Examples
>>> # Set a new global configuration value for the number of parallel workers. >>> # Now parallel dataset operators will run with 8 workers. >>> ds.config.set_num_parallel_workers(8)
- mindspore.dataset.config.set_numa_enable(numa_enable)
Set the default state of numa enabled. If numa_enable is True, need to ensure numa library is installed.
- Parameters
numa_enable (bool) – Whether to use numa bind feature.
- Raises
TypeError – If numa_enable is not a boolean data type.
Examples
>>> # Set a new global configuration value for the state of numa enabled. >>> # Now parallel dataset operators will run with numa bind function >>> ds.config.set_numa_enable(True)
- mindspore.dataset.config.set_prefetch_size(size)
Set the queue capacity of the thread in pipeline.
- Parameters
size (int) – The length of the cache queue.
- Raises
ValueError – If the queue capacity of the thread is invalid when size <= 0 or size > MAX_INT_32.
Note
Since total memory used for prefetch can grow very large with high number of workers, when the number of workers is greater than 4, the per worker prefetch size will be reduced. The actual prefetch size at runtime per-worker will be prefetchsize * (4 / num_parallel_workers).
Examples
>>> # Set a new global configuration value for the prefetch size. >>> ds.config.set_prefetch_size(1000)
- mindspore.dataset.config.set_seed(seed)
If the seed is set, the generated random number will be fixed, this helps to produce deterministic results.
Note
This set_seed function sets the seed in the Python random library and numpy.random library for deterministic Python augmentations using randomness. This set_seed function should be called with every iterator created to reset the random seed. In the pipeline, this does not guarantee deterministic results with num_parallel_workers > 1.
- Parameters
seed (int) – Random number seed. It is used to generate deterministic random numbers.
- Raises
ValueError – If seed is invalid when seed < 0 or seed > MAX_UINT_32.
Examples
>>> # Set a new global configuration value for the seed value. >>> # Operations with randomness will use the seed value to generate random values. >>> ds.config.set_seed(1000)
- mindspore.dataset.config.set_sending_batches(batch_num)
Set the default sending batches when training with sink_mode=True in Ascend device.
- Parameters
batch_num (int) – the total sending batches, when batch_num is set, it will wait unless sending batches increase, default is 0 which means will send all batches in dataset.
- Raises
TypeError – If batch_num is not in int type.
Examples
>>> # Set a new global configuration value for the sending batches >>> ds.config.set_sending_batches(10)