TensorRT Integration Information
Steps
Environment Preparation
Besides basic Environment Preparation, CUDA and TensorRT is required as well. Current version supports CUDA 10.1 and TensorRT 6.0.1.5, and CUDA 11.1 and TensorRT 8.5.1.
Install the appropriate version of CUDA and set the installed directory as environment variable ${CUDA_HOME}
. Our build script uses this environment variable to seek CUDA.
Install TensorRT of the corresponding CUDA version, and set the installed directory as environment viriable ${TENSORRT_PATH}
. Our build script uses this environment viriable to seek TensorRT.
Build
In the Linux environment, use the build.sh script in the root directory of MindSpore Source Code to build the MindSpore Lite package integrated with TensorRT. First configure the environment variable MSLITE_GPU_BACKEND=tensorrt
, and then execute the compilation command as follows.
bash build.sh -I x86_64
For more information about compilation, see Linux Environment Compilation.
Integration
Integration instructions
When developers need to integrate the use of TensorRT features, it is important to note:
Configure the TensorRT backend, For more information about using Runtime to perform inference, see Using Runtime to Perform Inference (C++).
Compile and execute the binary. If you use dynamic linking, please refer to Compilation Output with compilation option
-I x86_64
. Please set environment variables to dynamically link related libs.
export LD_LIBRARY_PATH=mindspore-lite-{version}-{os}-{arch}/runtime/lib/:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=user-installed-tensorrt-path/lib/:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=user-installed-cuda-path/lib/:$LD_LIBRARY_PATH
Using Benchmark testing TensorRT inference
Pass the build package to a device with a TensorRT environment(TensorRT 6.0.1.5) and use the Benchmark tool to test TensorRT inference. Examples are as follows:
Test performance
./benchmark --device=GPU --modelFile=./models/test_benchmark.ms --timeProfiling=true
Test precision
./benchmark --device=GPU --modelFile=./models/test_benchmark.ms --inDataFile=./input/test_benchmark.bin --inputShapes=1,32,32,1 --accuracyThreshold=3 --benchmarkDataFile=./output/test_benchmark.out
For more information about the use of Benchmark, see Benchmark Use.
For environment variable settings, you need to set the directory where the
libmindspore-lite.so
(under the directorymindspore-lite-{version}-{os}-{arch}/runtime/lib
), TensorRT and CUDAso
libraries are located, to${LD_LIBRARY_PATH}
.Using TensorRT engine serialization
TensorRT backend inference supports serializing the built TensorRT model (Engine) into a binary file and saves it locally. When it is used the next time, the model can be deserialized and loaded from the local, avoiding rebuilding and reducing overhead. To support this function, users need to use the LoadConfig interface to load the configuration file in the code, you need to specify the saving path of serialization file in the configuration file:
[ms_cache] serialize_path=/path/to/config
Using TensorRT dynamic shapes
By default, TensorRT optimizes the model based on the input shapes (batch size, image size, and so on) at which it was defined. However, the input dimension can be adjusted at runtime by configuring the profile. In the profile, the minimum, dynamic and optimal shape of each input can be set.
TensorRT creates an optimized engine for each profile, choosing CUDA kernels that work for all shapes within the [minimum ~ maximum] range. And in the profile, multiple input dimensions can be configured for a single input. To support this function, users need to use the LoadConfig interface to load the configuration file in the code.
If min, opt, and Max are the minimum, optimal, and maximum dimensions, and real_shape is the shape of the input tensor, the following conditions must hold:
len(min)
==len(opt)
==len(max)
==len(real_shape)
0 <
min[i]
<=opt[i]
<=max[i]
for alli
if
real_shape[i]
!= -1, thenmin[i]
==opt[i]
==max[i]
==real_shape[i]
When using tensor input without dynamic dimensions, all shapes must be equal to real_shape.
For example, if the model input1’s name is “input_name1”, its input shape is [3,-1,-1] (-1 means that this dimension supports dynamic shape), the minimum dimension is [3,100,200], the maximum dimension is [3,200,300], and the optimized dimension is [3,150,250]. The name of model input2 is “input_name2”, the input dimension is [-1,-1,1], the minimum size is [700,800,1], the maximum size is [800,900,1], and the optimized size is [750,850,1]. The following configuration file needs to be configured:
[gpu_context] input_shape=input_name1:[3,-1,-1];input_name2:[-1,-1,1] dynamic_dims=[100~200,200~300];[700~800,800~900] opt_dims=[150,250];[750,850]
It also support configuring multiple profiles at the same time. According to the above example, if we add a profile configuration for each model input, for the input1, the minimum size of the added profile is [3,201,200], the maximum size is [3,150,300], and the optimized size is [3,220,250]. Add a profile for input2, whose minimum size is [801,800,1], maximum size is [850,900,1], and optimized size is [810,850,1]. The following is an example of the profile:
[gpu_context] input_shape=input_name1:[3,-1,-1];input_name2:[-1,-1,1] dynamic_dims=[100~200,200~300],[201~250,200~300];[700~800,800~900],[801~850,800~900] opt_dims=[150,250],[220,250];[750,850],[810,850]
Supported Operators
For supported TensorRT operators, see Lite Operator List.