Call Third-Party Operators by Customized Operators
Overview
When built-in operators cannot meet requirements during network development, you can call the Python API Custom primitive defined in MindSpore to quickly create different types of customized operators for use.
You can choose different customized operator developing methods base on needs. See: custom_operator_custom.
There is a defining method called aot
which has a special use. The aot
mode can call the corresponding cpp
/cuda
function by loading the precompiled so
. Therefore, when a third-party library provides the cpp
/cuda
function API
, you can try to call its function interface in so
.
Here is an example of how to use Aten
library of PyTorch Aten.
Using PyTorch Aten operators for Docking
When migrating a network using the PyTorch Aten operator encounters a shortage of built-in operators, we can use the aot
development method of the Custom
operator to call PyTorch Aten’s operator for fast verification.
PyTorch provides a way to support the introduction of PyTorch’s header files to write cpp/cuda
code by using its associated data structures and compile it into so
. See:https://pytorch.org/docs/stable/_modules/torch/utils/cpp_extension.html#CppExtension.
Using a combination of the two methods, the customized operator can call the PyTorch Aten operator as follows:
1. Downloading the Project files
User can download the project files from here.
Use the following command to extract files and find the folder test_custom_pytorch
:
tar xvf test_custom_pytorch.tar
The folder include the following files:
test_custom_pytorch
├── env.sh # set PyTorch/lib into LD_LIBRARY_PATH
├── leaky_relu.cpp # an example of use Aten CPU operator
├── leaky_relu.cu # an example of use Aten GPU operator
├── ms_ext.cpp # convert Tensors between MindSpore and PyTorch
├── ms_ext.h # convert API
├── README.md
├── run_cpu.sh # a script to run cpu case
├── run_gpu.sh # a script to run gpu case
├── setup.py # a script to compile cpp/cu into so
├── test_cpu_op_in_gpu_device.py # a test file to run Aten CPU operator on GPU device
├── test_cpu_op.py # a test file to run Aten CPU operator on CPU device
└── test_gpu_op.py # a test file to run Aten GPU operator on GPU device
Using the PyTorch Aten operator focuses mainly on env.sh, setup.py, leaky_relu.cpp/cu, test_*, .py.
Among them, env.sh is used to set environment variables, setup.py is used to compile so, leaky_relu.cpp/cu is used to reference the source code that calls the PyTorch Aten operator, and test_*.py is used to refer to the call Custom operator.
2. Writing and calling the Source Code File of PyTorch Aten Operators
Refer to leaky_relu.cpp/cu to write a source code file that calls the PyTorch Aten operator.
The customized operator of aot
type adopts the AOT
compilation method, which requires network developers to hand-write the source code file of the operator implementation based on a specific interface, and compile the source code file into a dynamic link library in advance, and then the framework will automatically call the function defined in the dynamic link library. In terms of the development language of the operator implementation, the GPU
platform supports CUDA
, and the CPU
platform supports C
and C++
. The interface specification of the operator implemented by the operators in the source code file is as follows:
extern "C" int func_name(int nparam, void **params, int *ndims, int64_t **shapes, const char **dtypes, void *stream, void *extra);
If the cpu
operator is called, taking leaky_relu.cpp
as an example, the file provides the function LeakyRelu
required by AOT
, which calls torch::leaky_relu_out
function of PyTorch Aten:
#include <string.h>
#include <torch/extension.h> // Header file reference section
#include "ms_ext.h"
extern "C" int LeakyRelu(
int nparam,
void** params,
int* ndims,
int64_t** shapes,
const char** dtypes,
void* stream,
void* extra) {
auto tensors = get_torch_tensors(nparam, params, ndims, shapes, dtypes, c10::kCPU);
auto at_input = tensors[0];
auto at_output = tensors[1];
torch::leaky_relu_out(at_output, at_input);
// If you are using a version without output, the code is as follows:
// torch::Tensor output = torch::leaky_relu(at_input);
// at_output.copy_(output);
return 0;
}
If the gpu
operator is called, take leaky_relu.cu
as an example:
#include <string.h>
#include <torch/extension.h>
#include "ms_ext.h"
extern "C" int LeakyRelu(
int nparam,
void** params,
int* ndims,
int64_t** shapes,
const char** dtypes,
void* stream,
void* extra) {
cudaStream_t custream = static_cast<cudaStream_t>(stream);
cudaStreamSynchronize(custream);
auto tensors = get_torch_tensors(nparam, params, ndims, shapes, dtypes, c10::kCUDA);
auto at_input = tensors[0];
auto at_output = tensors[1];
torch::leaky_relu_out(at_output, at_input);
return 0;
}
PyTorch Aten provides operator functions versions with output and operator functions versions without output. Operator functions with output have the ‘_out’ suffix, and PyTorch Aten provides 300+ apis
of common operators.
When torch::*_out
is called, output
copy is not needed. When the versions without _out
suffix is called, API torch.Tensor.copy_
is needed to called to result copy.
To see which functions are supported for calling PyTorch Aten, the CPU
version refers to the PyTorch installation path: python*/site-packages/torch/include/ATen/CPUFunctions_inl.h
, and for the corresponding GPU
version, refers topython*/site-packages/torch/include/ATen/CUDAFunctions_inl.h
。
The apis provided by ms_ext.h are used in the above use case, which are briefly described here:
// Convert MindSpore kernel's inputs/outputs to PyTorch Aten's Tensor
std::vector<at::Tensor> get_torch_tensors(int nparam, void** params, int* ndims, int64_t** shapes, const char** dtypes, c10::Device device) ;
3. Using the compilation script setup.py
to generate so
setup.py uses the cppextension
provided by PyTorch Aten to compile the above c++/cuda
source code into an so
file.
Before execution, you need to make sure that PyTorch is installed.
pip install torch
Then add PyTorch’s lib
into LD_LIBRARY_PATH
。
export LD_LIBRARY_PATH=$(python3 -c 'import torch, os; print(os.path.dirname(torch.__file__))')/lib:$LD_LIBRARY_PATH
Run:
cpu: python setup.py leaky_relu.cpp leaky_relu_cpu.so
gpu: python setup.py leaky_relu.cu leaky_relu_gpu.so
Then the so files that we need may be obtained.
4. Using the Customized Operator
Taking CPU as an example, use the Custom operator to call the above PyTorch Aten operator, see the code test_cpu_op.py:
import numpy as np
from mindspore.nn import Cell
import mindspore.ops as ops
ms.set_context(device_target="CPU")
def LeakyRelu():
return ops.Custom("./leaky_relu_cpu.so:LeakyRelu", out_shape=lambda x : x, out_dtype=lambda x : x, func_type="aot")
class Net(Cell):
def __init__(self):
super(Net, self).__init__()
self.leaky_relu = LeakyRelu()
def construct(self, x):
return self.leaky_relu(x)
if __name__ == "__main__":
x0 = np.array([[0.0, -0.1], [-0.2, 1.0]]).astype(np.float32)
net = Net()
output = net(ms.Tensor(x0))
print(output)
Run:
python test_cpu_op.py
Result:
[[ 0. -0.001]
[-0.002 1. ]]
Attention:
When using a PyTorch Aten GPU
operator,set device_target
to "GPU"
.
ms.set_context(device_target="GPU")
op = ops.Custom("./leaky_relu_gpu.so:LeakyRelu", out_shape=lambda x : x, out_dtype=lambda x : x, func_type="aot")
When using a PyTorch Aten CPU
operator and device_target
is "GPU"
, the settings that need to be added are as follows:
ms.set_context(device_target="GPU")
op = ops.Custom("./leaky_relu_cpu.so:LeakyRelu", out_shape=lambda x : x, out_dtype=lambda x : x, func_type="aot")
op.add_prim_attr("primitive_target", "CPU")
Compile so with cppextension requires a compiler version that meets the tool’s needs, and check for the presence of gcc/clang/nvcc.
Compile so with cppextension will generate a build folder in the script path, which stores so. The script will copy so to outside of build, but cppextension will skip compilation if it finds that there is already so in build, so if it is a newly compiled so, remember to empty the so under the build.
The following tests is based on PyTorch 1.9.1,cuda11.1,python3.7. The download link:https://download.pytorch.org/whl/cu111/torch-1.9.1%2Bcu111-cp37-cp37m-linux_x86_64.whl. The cuda version supported by PyTorch Aten needs to be consistent with the local cuda version, and whether other versions are supported needs to be explored by the user.