Environment Variable Descriptions

The following environment variables are supported by MindSpore Transformers.

Debugging Variables

Variables Names	Default	Interpretations	Descriptions	Application Scenarios
HCCL_DETERMINISTIC	false	Whether to enable deterministic computation of reductive communication operators, where reductive communication operators include AllReduce, ReduceScatter, Reduce.	`true`: turns on the HCCL deterministic switch; `false`: turns off the HCCL deterministic switch.	Turning on deterministic computation eliminates the randomness introduced by inconsistent ordering of multi-card computations, but it results in a performance degradation compared to turning it off. It is recommended to turn it on in scenarios where consistency is required.
LCCL_DETERMINISTIC	0	whether to turn the LCCL deterministic operator AllReduce (order-preserving addition) on.	`1`: turns on the LCCL deterministic switch; `0`: turns off the LCCL deterministic switch.	Turning on deterministic computation eliminates the randomness introduced by inconsistent ordering of multi-card computations, but it results in a performance degradation compared to turning it off. It is recommended to turn it on in scenarios where consistency is required. Takes effect when rankSize<=8.
CUSTOM_MATMUL_SHUFFLE	on	Whether to enable shuffle operations for custom matrix multiplication.	`on`: turns on matrix shuffle; `off`: turns off matrix shuffle.	The shuffle operation is optimized for specific matrix sizes and memory access patterns. If the matrix size does not match the shuffle-optimized size, turning off shuffling may result in better performance. Please set it according to the actual usage.
ASCEND_LAUNCH_BLOCKING	0	training or online inference scenarios, this environment variable can be used to control whether synchronization mode is activated during operator execution.	`1`: synchronized mode is mandatory; `0`: synchronized mode is optional.	Since the default operator executes asynchronously during NPU model training, when an error is reported during operator execution, the error stack information printed is not the actual call stack information. When set to `1`, synchronized mode is mandatory, which prints the correct call stack information and makes it easier to debug and locate problems in the code. Setting it to `1` provides more efficient arithmetic.
TE_PARALLEL_COMPILER	8	The number of threads on which the operator is compiled in parallel. Enables parallel compilation when greater than 1.	Takes a positive integer;Maximum number of cpu cores*80%/number of Ascend AI processors, value range 1~32, default value is 8.	When the network model is large, parallel compilation of the operator can be turned on by configuring this environment variable; setting it to `1` for single-threaded compilation simplifies the difficulty when debugging.
CPU_AFFINITY	0	Turn on the CPU affinity switch, thus ensuring that each process or thread is bound to a single CPU core to improve performance.	`1`: turn on the CPU affinity switch; `0`: turn off the CPU affinity switch.	CPU affinity is turned off by default for optimized resource utilization and energy saving.
MS_MEMORY_STATISTIC	0	Memory Statistics.	`1`: turn on memory statistics; `0`: turn off memory statistics.	During memory analysis, basic memory usage can be counted. You can refer to Optimization Guide for details.
MINDSPORE_DUMP_CONFIG	NA	Specify the path to the configuration file that the cloud-side Dump function or end-side Dump function depends on.	File path, support relative path and absolute path.
GLOG_v	3	Controls the level of MindSpore logs.	`0`: DEBUG `1`: INFO `2`: WARNING `3`: ERROR: indicates that an error has been reported in the execution of the program, an error log is output, and the program may not be terminated; `4`: CRITICAL, indicates that an exception has occurred in the execution of the program, and the execution of the program will be terminated.
ASCEND_GLOBAL_LOG_LEVEL	3	Controls the logging level of CANN.	`0`: DEBUG `1`: INFO `2`: WARNING `3`: ERROR `4`: NULL, no log is output.
ASCEND_SLOG_PRINT_TO_STDOUT	0	Whether to display on the screen. When turned on, the logs will not be saved in the log file, but the generated logs will be displayed directly on the screen.	`1`: Display on the screen `0`: Do not display on the screen
ASCEND_GLOBAL_EVENT_ENABLE	0	Whether to enable event logging.	`1`: turn on Event logging; `0`: turn off Event logging.
HCCL_EXEC_TIMEOUT	1836	This environment variable allows you to control the amount of time to wait for synchronization when executing between devices, where each device process waits for the other device to perform communication synchronization for the configured amount of time.	The range is: (0, 17340], and the default value is 1836 in s.
HCCL_CONNECT_TIMEOUT	120	Used in distributed training or inference scenarios to limit the timeout wait time of the socket building process between different devices.	The environment variable needs to be configured as an integer in the range [120,7200], with default value 120s.
MS_NODE_ID	NA	Specifies process rank id in dynamic cluster scenarios.	The rank_id of the process, unique within the cluster.

Other Variables

Variables Names	Default	Interpretations	Descriptions	Application Scenarios
RUN_MODE	predict	Set the running mode.	`predict`: inference `finetune`: Fine-tuning `train`: Training `eval`: Evaluation
USE_ROPE_SELF_DEFINE	true	Whether to enable ROPE fusion operator.	`true`: enable ROPE fusion operator; `false`: disable ROPE fusion operator.	Enabling the ROPE fusion operator by default can improve the computation efficiency. Except for debugging scenarios, turn it off as needed, and generally do not make special settings.
MS_ENABLE_INTERNAL_BOOST	on	Whether to turn on the internal acceleration of the MindSpore framework.	`on`: turn on MindSpore internal acceleration; `off`: turn off MindSpore internal acceleration.	In order to achieve high-performance inference, this parameter is turned on by default. In cases where debugging or comparing different acceleration strategies is performed, this parameter needs to be turned off to observe the impact on performance.
MS_GE_ATOMIC_CLEAN_POLICY	1	Whether to clean up the memory occupied by atomic operators in the network.	`0`: centralized cleanup of memory occupied by all atomic operators in the network; `1`: no centralized memory cleanup, individual zeroing of each atomic operator in the network.	The switch is set to `1` by default, which makes it easy for the user to process each operator individually, allowing operations such as operator memory reuse. Setting it to `0` centrally cleans up the memory occupied by the operators.
ENABLE_LAZY_INLINE	1	Whether to enable lazy inline.	`0`: turn off lazy inline; `1`: turn on lazy inline.	Available under mindspore ≥ 2.2.0. It is usually used during pipeline parallelism to improve compilation performance. It is enabled by default and can be configured to be disabled.
ENABLE_LAZY_INLINE_NO_PIPELINE	0	Whether to enable lazy inline under non-pipeline parallel.	`0`: turn off lazy inline; `1`: turn on lazy inline.	The lazy inline feature is only enabled in pipeline parallel mode by default. To enable lazy inline in other parallel modes, set this environment variable to 1.
MS_ASCEND_CHECK_OVERFLOW_MODE	INFNAN_MODE	Sets the overflow detection mode.	`SATURATION_MODE`: saturation mode, saturates to floating-point extremes (+-MAX) when the calculation overflows; `INFNAN_MODE`: INF/NAN mode, follows the IEEE 754 standard, and outputs INF/NAN calculations as defined.	In large model tuning, the overflow state is aligned PyTorch and it is recommended to use INFNAN_MODE, i.e. export MS_ASCEND_CHECK_OVERFLOW_MODE=INFNAN_MODE. Try setting this variable to INFNAN_MODE when encountering persistent overflow problems.
MF_LOG_SUFFIX	NA	Set custom suffixes for all log log folders.	Suffix for the log folder. Default: no suffix	Adding a consistent suffix isolates logs across tasks from being overwritten.
PLOG_REDIRECT_TO_OUTPUT	False	Controls whether plog logs change storage paths.	`True`: store the logs in the ./output directory; `False`: Store to the default storage location.	This setting makes it easier to query the plog log.
MS_ENABLE_FA_FLATTEN	on	Controls whether support FlashAttention flatten optimization.	`on`: Enable FlashAttention flatten optimization; `off`: Disable FlashAttention flatten optimization.	Provide a fallback mechanism for models that have not yet been adapted to FlashAttention flatten optimization.
EXPERIMENTAL_KERNEL_LAUNCH_GROUP	NA	Control whether to support the batch parallel submission of operators. If supported, enable the parallel submission and configure the number of parallel submissions.	`thread_num`: The number of concurrent threads is not recommended to be increased. The default value is 2; `kernel_group_num`: Total number of operator groups, 'kernel_group_num/thread_num' groups per thread, default is' 8 '.	This feature will continue to evolve in the future, and the subsequent behavior may change. Currently, only the `deepseek` reasoning scenario is supported, with certain performance optimization, but other models using this feature may deteriorate, and users need to use it with caution, as follows:`export EXPERIMENTAL_KERNEL_LAUNCH_GROUP="thread_num:2,kernel_group_num:8"`.