Custom Fusion Pass
Overview
Operator fusion combines multiple independent operators into a larger, more complex operator to reduce runtime memory accesses and improve computational efficiency. This approach minimizes the storage and transmission of intermediate results, effectively reducing memory access overhead. Additionally, fusing multiple operators reduces the number of computations, which can significantly enhance computational efficiency on parallel computing devices like NPUs.
MindSpore automatically fuses operators by default, combining multiple consecutive small operators following certain patterns into a fused operator. Each fused operator corresponds to a fusion pass. MindSpore runs fusion passes on MindIR to fuse and replace operators. MindSpore provides numerous optimization passes for operator fusion to satisfy the requirements of most users.
However, during actual network debugging, users may wish to manually control the switches for operator fusion passes. For example:
When debugging a network, users can manually control the switches for fusion passes based on their scenarios, excluding fusion operators that perform poorly in that scenario or adopting more aggressive fusion strategies to improve network computation speed.
When encountering accuracy issues, users can disable some operator fusions to locate the problem, determining the operator responsible for the network accuracy issue.
Therefore, MindSpore provides interfaces related to fusion operator optimization passes, allowing users to toggle fusion passes for debugging purposes.
Debugging Interfaces
Currently, operator fusion-related optimization passes are included in the graph kernel optimization module. The set_context
function provides the graph_kernel_flags
option to control the switches for related graph optimization passes, including:
Disable Pass: Use
set_context(graph_kernel_flags="--disable_pass=xxx")
, wherexxx
is the name of the pass to disable.Enable Pass: Use
set_context(graph_kernel_flags="--enable_pass=xxx")
, wherexxx
is the name of the pass to enable.
Obtaining Pass Names
Users have two ways to obtain the corresponding pass names during debugging, or they can refer to the appendix list for supported passes.
Through IR Names
If users have dumped the relevant IR, they can obtain the related fusion pass name from the IR name. For example, if the IR name is hwopt ge_unify_mindir_pm_44_add_layer_norm_fusion_0559.ir
, the pass name add_layer_norm_fusion
can be extracted from the ir name.
Through INFO Messages
In [INFO]
messages, we provide a list of all passes that support custom switches. Users can generate [INFO]
messages by setting export GLOG_v=1
. In the [INFO]
messages, users can search for graph kernel pass
to obtain the list of these passes. For example, in the following message, the names of all passes that can be customized are listed after graph kernel pass:
.
[INFO] PRE_ACT(631369,ffffb5450af0,python):2024-08-22-15:34:16.978.158 [mindspore/ccsrc/plugin/device/ascend/optimizer/backend_common_unify_mindir.cc:191] GetBackendFusionGroupPassManager] graph kernel passes: FlashAttentionFusionV1,FlashAttentionFusionV2,add_layer_norm_fusion,add_layer_norm_v3_fusion,add_layer_norm_ext_fusion,inference_swiglu_fusion,inference_matmul_split_fusion,shape_reshape,shape_reshape_2,add_rms_norm_quant_fusion,rms_norm_quant_fusion,add_rms_norm_fusion,add_cast_rms_norm_cast_fusion,MatMulAllReduce,split_concat_fusion,matmul_elem_biasadd_fusion,matmul_elem_add_fusion,matmul_elem_relu_fusion,matmul_elem_gelu_fusion,inference_qbmm_add_fusion,inference_qbmm_allreduce_add_fusion.
For individual passes, users can also confirm whether they are enabled through log messages. For example:
Enabled Pass: The following message indicates that
rms_norm_quant_fusion
is enabled and can be disabled usingdisable_pass
.[INFO] GRAPH_KERNEL(631369,ffffb5450af0,python):2024-08-22-15:34:17.640.739 [mindspore/ccsrc/backend/common/graph_kernel/core/graph_kernel_pass_manager.cc:84] RunPass] Run graph kernel pass fusion_group_10_rms_norm_quant_fusion in 74.64 us
Disabled Pass: The following message indicates that
transpose_matmul_fusion
is disabled and can be enabled usingenable_pass
.[INFO] GRAPH_KERNEL(631369,ffffb5450af0,python):2024-08-22-15:34:17.640.771 [mindspore/ccsrc/backend/common/graph_kernel/core/graph_kernel_pass_manager.cc:73] Run] graph kernel pass fusion_group_11_add_rms_norm_fusion is disabled.
Appendix: List of Enabled Passes for Relevant Backends
Note: This list is provided for reference only and subject to change. The actual enabled passes should be determined using the methods described above.
Pass Names |
Backend |
---|---|
FlashAttentionFusionV1 |
Ascend |
FlashAttentionFusionV2 |
Ascend |
add_layer_norm_fusion |
Ascend |
add_layer_norm_v3_fusion |
Ascend |
add_layer_norm_ext_fusion |
Ascend |
inference_swiglu_fusion |
Ascend |
inference_matmul_split_fusion |
Ascend |
shape_reshape |
Ascend |
shape_reshape_2 |
Ascend |
add_rms_norm_quant_fusion |
Ascend |
rms_norm_quant_fusion |
Ascend |
add_rms_norm_fusion |
Ascend |
add_cast_rms_norm_cast_fusion |
Ascend |
MatMulAllReduce |
Ascend |
split_concat_fusion |
Ascend |
matmul_elem_biasadd_fusion |
Ascend |
matmul_elem_add_fusion |
Ascend |
matmul_elem_relu_fusion |
Ascend |
matmul_elem_gelu_fusion |
Ascend |
inference_qbmm_add_fusion |
Ascend |
inference_qbmm_allreduce_add_fusion |
Ascend |