Custom Fusion Pass

View Source On Gitee

Overview

Operator fusion combines multiple independent operators into a larger, more complex operator to reduce runtime memory accesses and improve computational efficiency. This approach minimizes the storage and transmission of intermediate results, effectively reducing memory access overhead. Additionally, fusing multiple operators reduces the number of computations, which can significantly enhance computational efficiency on parallel computing devices like NPUs.

MindSpore automatically fuses operators by default, combining multiple consecutive small operators following certain patterns into a fused operator. Each fused operator corresponds to a fusion pass. MindSpore runs fusion passes on MindIR to fuse and replace operators. MindSpore provides numerous optimization passes for operator fusion to satisfy the requirements of most users.

However, during actual network debugging, users may wish to manually control the switches for operator fusion passes. For example:

  • When debugging a network, users can manually control the switches for fusion passes based on their scenarios, excluding fusion operators that perform poorly in that scenario or adopting more aggressive fusion strategies to improve network computation speed.

  • When encountering accuracy issues, users can disable some operator fusions to locate the problem, determining the operator responsible for the network accuracy issue.

Therefore, MindSpore provides interfaces related to fusion operator optimization passes, allowing users to toggle fusion passes for debugging purposes.

Debugging Interfaces

Currently, operator fusion-related optimization passes are included in the graph kernel optimization module. The set_context function provides the graph_kernel_flags option to control the switches for related graph optimization passes, including:

  • Disable Pass: Use set_context(graph_kernel_flags="--disable_pass=xxx"), where xxx is the name of the pass to disable.

  • Enable Pass: Use set_context(graph_kernel_flags="--enable_pass=xxx"), where xxx is the name of the pass to enable.

Obtaining Pass Names

Users have two ways to obtain the corresponding pass names during debugging, or they can refer to the appendix list for supported passes.

Through IR Names

If users have dumped the relevant IR, they can obtain the related fusion pass name from the IR name. For example, if the IR name is hwopt ge_unify_mindir_pm_44_add_layer_norm_fusion_0559.ir, the pass name add_layer_norm_fusion can be extracted from the ir name.

Through INFO Messages

In [INFO] messages, we provide a list of all passes that support custom switches. Users can generate [INFO] messages by setting export GLOG_v=1. In the [INFO] messages, users can search for graph kernel pass to obtain the list of these passes. For example, in the following message, the names of all passes that can be customized are listed after graph kernel pass:.

[INFO] PRE_ACT(631369,ffffb5450af0,python):2024-08-22-15:34:16.978.158 [mindspore/ccsrc/plugin/device/ascend/optimizer/backend_common_unify_mindir.cc:191] GetBackendFusionGroupPassManager] graph kernel passes: FlashAttentionFusionV1,FlashAttentionFusionV2,add_layer_norm_fusion,add_layer_norm_v3_fusion,add_layer_norm_ext_fusion,inference_swiglu_fusion,inference_matmul_split_fusion,shape_reshape,shape_reshape_2,add_rms_norm_quant_fusion,rms_norm_quant_fusion,add_rms_norm_fusion,add_cast_rms_norm_cast_fusion,MatMulAllReduce,split_concat_fusion,matmul_elem_biasadd_fusion,matmul_elem_add_fusion,matmul_elem_relu_fusion,matmul_elem_gelu_fusion,inference_qbmm_add_fusion,inference_qbmm_allreduce_add_fusion.

For individual passes, users can also confirm whether they are enabled through log messages. For example:

  • Enabled Pass: The following message indicates that rms_norm_quant_fusion is enabled and can be disabled using disable_pass.

    [INFO] GRAPH_KERNEL(631369,ffffb5450af0,python):2024-08-22-15:34:17.640.739 [mindspore/ccsrc/backend/common/graph_kernel/core/graph_kernel_pass_manager.cc:84] RunPass] Run graph kernel pass fusion_group_10_rms_norm_quant_fusion in 74.64 us
    
  • Disabled Pass: The following message indicates that transpose_matmul_fusion is disabled and can be enabled using enable_pass.

    [INFO] GRAPH_KERNEL(631369,ffffb5450af0,python):2024-08-22-15:34:17.640.771 [mindspore/ccsrc/backend/common/graph_kernel/core/graph_kernel_pass_manager.cc:73] Run] graph kernel pass fusion_group_11_add_rms_norm_fusion is disabled.
    

Appendix: List of Enabled Passes for Relevant Backends

Note: This list is provided for reference only and subject to change. The actual enabled passes should be determined using the methods described above.

Pass Names

Backend

FlashAttentionFusionV1

Ascend

FlashAttentionFusionV2

Ascend

add_layer_norm_fusion

Ascend

add_layer_norm_v3_fusion

Ascend

add_layer_norm_ext_fusion

Ascend

inference_swiglu_fusion

Ascend

inference_matmul_split_fusion

Ascend

shape_reshape

Ascend

shape_reshape_2

Ascend

add_rms_norm_quant_fusion

Ascend

rms_norm_quant_fusion

Ascend

add_rms_norm_fusion

Ascend

add_cast_rms_norm_cast_fusion

Ascend

MatMulAllReduce

Ascend

split_concat_fusion

Ascend

matmul_elem_biasadd_fusion

Ascend

matmul_elem_add_fusion

Ascend

matmul_elem_relu_fusion

Ascend

matmul_elem_gelu_fusion

Ascend

inference_qbmm_add_fusion

Ascend

inference_qbmm_allreduce_add_fusion

Ascend