Experience C++ Minimalist Concurrent Reasoning Demo
Overview
This tutorial provides a MindSpore Lite parallel inference demo. It demonstrates the basic on-device inference process using C++ by inputting random data, executing inference, and printing the inference result. You can quickly understand how to use inference-related APIs on MindSpore Lite. In this tutorial, the randomly generated data is used as the input data to perform the inference on the MobileNetV2 model and print the output data. The code is stored in the mindspore/lite/examples/quick_start_server_inference_cpp directory.
The MindSpore Lite parallel inference steps are as follows:
Read the model: Read the
.ms
model file converted by the model conversion tool from the file system.Create configuration options: Create and configure to save some basic configuration parameters required to build and execute the model.
Init a ModelParallelRunner: Initialization: Before executing concurrent inference, you need to call the init interface of ModelParallelRunner to initialize concurrent inference, mainly for model reading, concurrent creation, subgraph segmentation, and operator selection and scheduling. This part will take a lot of time, so it is recommended to initialize it once and perform concurrent inference multiple times.
Input data: Before the model is executed, data needs to be filled in the
Input Tensor
.parallel inference: Use Predict of ModelParallelRunner to perform model inference.
Obtain the output: After the model execution is complete, you can obtain the inference result by
Output Tensor
.Release the memory: If the MindSpore Lite inference framework is not required, release the created ModelParallelRunner.
Building and Running
Linux X86
Environment requirements
Build
Run the build script in the
mindspore/lite/examples/quick_start_server_inference_cpp
directory to automatically download the MindSpore Lite inference framework library and model files and build the demo.bash build.sh
If the MindSpore Lite inference framework fails to be downloaded by using this build script, manually download the MindSpore Lite model inference framework mindspore-lite-{version}-linux-x64.tar.gz whose hardware platform is CPU and operating system is Ubuntu-x64, and copy the
libmindspore-lite.a
file in the decompressed lib directory andlibmindspore_glog.so.0
file in the decompressedglog
directory to themindspore/lite/examples/quick_start_server_inference_cpp/lib
directory. Also copy the files fromruntime/include
to themindspore/lite/examples/quick_start_server_inference_cpp/include
directory.If the MobileNetV2 model fails to be downloaded, manually download the model file mobilenetv2.ms and copy it to the
mindspore/lite/examples/quick_start_server_inference_cpp/model
directory.After manually downloading and placing the file in the specified location, you need to execute the build.sh script again to complete the compilation.
Inference
After the build, go to the
mindspore/lite/examples/quick_start_server_inference_cpp/build
directory and run the following command to experience MindSpore Lite inference on the MobileNetV2 model:./mindspore_quick_start_cpp ../model/mobilenetv2.ms
After the execution, the following information is displayed, including the tensor name, tensor size, number of output tensors, and the first 50 pieces of data.
tensor name is:Softmax-65 tensor size is:4004 tensor elements num is:1001 output data is:1.74225e-05 1.15919e-05 2.02728e-05 0.000106485 0.000124295 0.00140576 0.000185107 0.000762011 1.50996e-05 5.91942e-06 6.61469e-06 3.72883e-06 4.30761e-06 2.38897e-06 1.5163e-05 0.000192663 1.03767e-05 1.31953e-05 6.69638e-06 3.17411e-05 4.00895e-06 9.9641e-06 3.85127e-06 6.25101e-06 9.08853e-06 1.25043e-05 1.71761e-05 4.92751e-06 2.87637e-05 7.46446e-06 1.39375e-05 2.18824e-05 1.08861e-05 2.5007e-06 3.49876e-05 0.000384547 5.70778e-06 1.28909e-05 1.11038e-05 3.53906e-06 5.478e-06 9.76608e-06 5.32172e-06 1.10386e-05 5.35474e-06 1.35796e-05 7.12652e-06 3.10017e-05 4.34154e-06 7.89482e-05 1.79441e-05
Init
// Create and init context, add CPU device info
auto context = std::make_shared<mindspore::Context>();
if (context == nullptr) {
std::cerr << "New context failed." << std::endl;
return -1;
}
auto &device_list = context->MutableDeviceInfo();
auto device_info = std::make_shared<mindspore::CPUDeviceInfo>();
if (device_info == nullptr) {
std::cerr << "New CPUDeviceInfo failed." << std::endl;
return -1;
}
device_list.push_back(device_info);
// Create model
auto model_runner = new (std::nothrow) mindspore::ModelParallelRunner();
if (model_runner == nullptr) {
std::cerr << "New Model failed." << std::endl;
return -1;
}
auto runner_config = std::make_shared<mindspore::RunnerConfig>();
if (runner_config == nullptr) {
std::cerr << "runner config is nullptr." << std::endl;
return -1;
}
runner_config->SetContext(context);
runner_config->SetWorkersNum(kNumWorkers);
// Build model
auto build_ret = model_runner->Init(model_path, runner_config);
if (build_ret != mindspore::kSuccess) {
delete model_runner;
std::cerr << "Build model error " << build_ret << std::endl;
return -1;
}
Parallel predict
ModelParallelRunner predict includes input data injection, inference execution, and output obtaining. In this example, the input data is randomly generated, and the output result is printed after inference.
// Get Input
auto inputs = model_runner->GetInputs();
if (inputs.empty()) {
delete model_runner;
std::cerr << "model input is empty." << std::endl;
return -1;
}
// set random data to input data.
auto ret = SetInputDataWithRandom(inputs);
if (ret != 0) {
delete model_runner;
std::cerr << "set input data failed." << std::endl;
return -1;
}
// Get Output
auto outputs = model_runner->GetOutputs();
for (auto &output : outputs) {
size_t size = output.DataSize();
if (size == 0 || size > MAX_MALLOC_SIZE) {
std::cerr << "malloc size is wrong" << std::endl;
return -1;
}
auto out_data = malloc(size);
output.SetData(out_data);
}
// Model Predict
auto predict_ret = model_runner->Predict(inputs, &outputs);
if (predict_ret != mindspore::kSuccess) {
delete model_runner;
std::cerr << "Predict error " << predict_ret << std::endl;
return -1;
}
Memory Release
If the inference process of MindSpore Lite is complete, release the created ModelParallelRunner
and input data.
// user need free input data and output data
for (auto &input : inputs) {
free(input.MutableData());
input.SetData(nullptr);
}
for (auto &output : outputs) {
free(output.MutableData());
output.SetData(nullptr);
}
// Delete model runner.
delete model_runner;