Environment Variable Descriptions
The following environment variables are supported by MindFormers.
Debugging Variables
Variables Names |
Default |
Interpretations |
Descriptions |
Application Scenarios |
---|---|---|---|---|
HCCL_DETERMINISTIC |
false |
Whether to enable deterministic computation of reductive communication operators, where reductive communication operators include AllReduce, ReduceScatter, Reduce. |
|
Turning on deterministic computation eliminates the randomness introduced by inconsistent ordering of multi-card computations, but it results in a performance degradation compared to turning it off. It is recommended to turn it on in scenarios where consistency is required. |
LCCL_DETERMINISTIC |
0 |
whether to turn the LCCL deterministic operator AllReduce (order-preserving addition) on. |
|
Turning on deterministic computation eliminates the randomness introduced by inconsistent ordering of multi-card computations, but it results in a performance degradation compared to turning it off. It is recommended to turn it on in scenarios where consistency is required. |
CUSTOM_MATMUL_SHUFFLE |
on |
Whether to enable shuffle operations for custom matrix multiplication. |
|
The shuffle operation is optimized for specific matrix sizes and memory access patterns. If the matrix size does not match the shuffle-optimized size, turning off shuffling may result in better performance. Please set it according to the actual usage. |
ASCEND_LAUNCH_BLOCKING |
0 |
training or online inference scenarios, this environment variable can be used to control whether synchronization mode is activated during operator execution. |
|
Since the default operator executes asynchronously during NPU model training, when an error is reported during operator execution, the error stack information printed is not the actual call stack information. When set to |
TE_PARALLEL_COMPILER |
8 |
The number of threads on which the operator is compiled in parallel. Enables parallel compilation when greater than 1. |
Takes a positive integer;Maximum number of cpu cores*80%/number of Ascend AI processors, value range 1~32, default value is 8. |
When the network model is large, parallel compilation of the operator can be turned on by configuring this environment variable; |
CPU_AFFINITY |
0 |
Turn on the CPU affinity switch, thus ensuring that each process or thread is bound to a single CPU core to improve performance. |
|
CPU affinity is turned off by default for optimized resource utilization and energy saving. |
MS_MEMORY_STATISTIC |
0 |
Memory Statistics. |
|
During memory analysis, basic memory usage can be counted. You can refer to Optimization Guide for details. |
MINDSPORE_DUMP_CONFIG |
NA |
Specify the path to the configuration file that the cloud-side Dump function or end-side Dump function depends on. |
File path, support relative path and absolute path. |
|
GLOG_v |
3 |
Controls the level of MindSpore logs. |
|
|
ASCEND_GLOBAL_LOG_LEVEL |
3 |
Controls the logging level of CANN. |
|
|
ASCEND_SLOG_PRINT_TO_STDOUT |
0 |
Whether to display on the screen. When turned on, the logs will not be saved in the log file, but the generated logs will be displayed directly on the screen. |
|
|
ASCEND_GLOBAL_EVENT_ENABLE |
0 |
Whether to enable event logging. |
|
|
HCCL_EXEC_TIMEOUT |
1836 |
This environment variable allows you to control the amount of time to wait for synchronization when executing between devices, where each device process waits for the other device to perform communication synchronization for the configured amount of time. |
The range is: (0, 17340], and the default value is 1836 in s. |
|
HCCL_CONNECT_TIMEOUT |
120 |
Used in distributed training or inference scenarios to limit the timeout wait time of the socket building process between different devices. |
The environment variable needs to be configured as an integer in the range [120,7200], with default value 120s. |
|
MS_NODE_ID |
NA |
Specifies process rank id in dynamic cluster scenarios. |
The rank_id of the process, unique within the cluster. |
Other Variables
Variables Names |
Default |
Interpretations |
Descriptions |
Application Scenarios |
---|---|---|---|---|
RUN_MODE |
predict |
Set the running mode. |
|
|
USE_ROPE_SELF_DEFINE |
true |
Whether to enable ROPE fusion operator. |
|
Enabling the ROPE fusion operator by default can improve the computation efficiency. Except for debugging scenarios, turn it off as needed, and generally do not make special settings. |
MS_ENABLE_INTERNAL_BOOST |
on |
Whether to turn on the internal acceleration of the MindSpore framework. |
|
In order to achieve high-performance inference, this parameter is turned on by default. In cases where debugging or comparing different acceleration strategies is performed, this parameter needs to be turned off to observe the impact on performance. |
MS_GE_ATOMIC_CLEAN_POLICY |
1 |
Whether to clean up the memory occupied by atomic operators in the network. |
|
The switch is set to |
ENABLE_LAZY_INLINE |
1 |
Whether to enable lazy inline. |
|
Available under mindspore ≥ 2.2.0. It is usually used during pipeline parallelism to improve compilation performance. It is enabled by default and can be configured to be disabled. |
ENABLE_LAZY_INLINE_NO_PIPELINE |
0 |
Whether to enable lazy inline under non-pipeline parallel. |
|
The lazy inline feature is only enabled in pipeline parallel mode by default. To enable lazy inline in other parallel modes, set this environment variable to 1. |
MS_ASCEND_CHECK_OVERFLOW_MODE |
INFNAN_MODE |
Sets the overflow detection mode. |
|
In large model tuning, the overflow state is aligned PyTorch and it is recommended to use INFNAN_MODE, i.e. export MS_ASCEND_CHECK_OVERFLOW_MODE=INFNAN_MODE. |
MF_LOG_SUFFIX |
NA |
Set custom suffixes for all log log folders. |
Suffix for the log folder. Default: no suffix |
Adding a consistent suffix isolates logs across tasks from being overwritten. |
PLOG_REDIRECT_TO_OUTPUT |
False |
Controls whether plog logs change storage paths. |
|
This setting makes it easier to query the plog log. |
MS_ENABLE_FA_FLATTEN |
on |
Controls whether support FlashAttention flatten optimization. |
|
Provide a fallback mechanism for models that have not yet been adapted to FlashAttention flatten optimization. |