Experience C++ Minimalist Concurrent Reasoning Demo

Overview

This tutorial provides a MindSpore Lite parallel inference demo. It demonstrates the basic on-device inference process using C++ by inputting random data, executing inference, and printing the inference result. You can quickly understand how to use inference-related APIs on MindSpore Lite. In this tutorial, the randomly generated data is used as the input data to perform the inference on the MobileNetV2 model and print the output data. The code is stored in the mindspore/lite/examples/quick_start_server_inference_cpp directory.

The MindSpore Lite parallel inference steps are as follows:

  1. Read the model: Read the .ms model file converted by the model conversion tool from the file system.

  2. Create configuration options: Create and configure to save some basic configuration parameters required to build and execute the model.

  3. Init a ModelParallelRunner: Initialization: Before executing concurrent inference, you need to call the init interface of ModelParallelRunner to initialize concurrent inference, mainly for model reading, concurrent creation, subgraph segmentation, and operator selection and scheduling. This part will take a lot of time, so it is recommended to initialize it once and perform concurrent inference multiple times.

  4. Input data: Before the model is executed, data needs to be filled in the Input Tensor.

  5. parallel inference: Use Predict of ModelParallelRunner to perform model inference.

  6. Obtain the output: After the model execution is complete, you can obtain the inference result by Output Tensor.

  7. Release the memory: If the MindSpore Lite inference framework is not required, release the created ModelParallelRunner.

img

Building and Running

Linux X86

  • Environment requirements

    • System environment: Linux x86_64 (Ubuntu 18.04.02LTS is recommended.)

    • Build dependency:

  • Build

    Run the build script in the mindspore/lite/examples/quick_start_server_inference_cpp directory to automatically download the MindSpore Lite inference framework library and model files and build the demo.

    bash build.sh
    

    If the MindSpore Lite inference framework fails to be downloaded by using this build script, manually download the MindSpore Lite model inference framework mindspore-lite-{version}-linux-x64.tar.gz whose hardware platform is CPU and operating system is Ubuntu-x64, and copy the libmindspore-lite.a file in the decompressed lib directory and libmindspore_glog.so.0 file in the decompressed glog directory to the mindspore/lite/examples/quick_start_server_inference_cpp/lib directory. Also copy the files from runtime/include to the mindspore/lite/examples/quick_start_server_inference_cpp/include directory.

    If the MobileNetV2 model fails to be downloaded, manually download the model file mobilenetv2.ms and copy it to the mindspore/lite/examples/quick_start_server_inference_cpp/model directory.

    After manually downloading and placing the file in the specified location, you need to execute the build.sh script again to complete the compilation.

  • Inference

    After the build, go to the mindspore/lite/examples/quick_start_server_inference_cpp/build directory and run the following command to experience MindSpore Lite inference on the MobileNetV2 model:

    ./mindspore_quick_start_cpp ../model/mobilenetv2.ms
    

    After the execution, the following information is displayed, including the tensor name, tensor size, number of output tensors, and the first 50 pieces of data.

    tensor name is:Softmax-65 tensor size is:4004 tensor elements num is:1001
    output data is:1.74225e-05 1.15919e-05 2.02728e-05 0.000106485 0.000124295 0.00140576 0.000185107 0.000762011 1.50996e-05 5.91942e-06 6.61469e-06 3.72883e-06 4.30761e-06 2.38897e-06 1.5163e-05 0.000192663 1.03767e-05 1.31953e-05 6.69638e-06 3.17411e-05 4.00895e-06 9.9641e-06 3.85127e-06 6.25101e-06 9.08853e-06 1.25043e-05 1.71761e-05 4.92751e-06 2.87637e-05 7.46446e-06 1.39375e-05 2.18824e-05 1.08861e-05 2.5007e-06 3.49876e-05 0.000384547 5.70778e-06 1.28909e-05 1.11038e-05 3.53906e-06 5.478e-06 9.76608e-06 5.32172e-06 1.10386e-05 5.35474e-06 1.35796e-05 7.12652e-06 3.10017e-05 4.34154e-06 7.89482e-05 1.79441e-05
    

Init

// Create and init context, add CPU device info
auto context = std::make_shared<mindspore::Context>();
if (context == nullptr) {
  std::cerr << "New context failed." << std::endl;
  return -1;
}
auto &device_list = context->MutableDeviceInfo();
auto device_info = std::make_shared<mindspore::CPUDeviceInfo>();
if (device_info == nullptr) {
  std::cerr << "New CPUDeviceInfo failed." << std::endl;
  return -1;
}
device_list.push_back(device_info);

// Create model
auto model_runner = new (std::nothrow) mindspore::ModelParallelRunner();
if (model_runner == nullptr) {
  std::cerr << "New Model failed." << std::endl;
  return -1;
}
auto runner_config = std::make_shared<mindspore::RunnerConfig>();
if (runner_config == nullptr) {
  std::cerr << "runner config is nullptr." << std::endl;
  return -1;
}
runner_config->SetContext(context);
runner_config->SetWorkersNum(kNumWorkers);
// Build model
auto build_ret = model_runner->Init(model_path, runner_config);
if (build_ret != mindspore::kSuccess) {
  delete model_runner;
  std::cerr << "Build model error " << build_ret << std::endl;
  return -1;
}

Parallel predict

ModelParallelRunner predict includes input data injection, inference execution, and output obtaining. In this example, the input data is randomly generated, and the output result is printed after inference.

// Get Input
auto inputs = model_runner->GetInputs();
if (inputs.empty()) {
  delete model_runner;
  std::cerr << "model input is empty." << std::endl;
  return -1;
}
// set random data to input data.
auto ret = SetInputDataWithRandom(inputs);
if (ret != 0) {
  delete model_runner;
  std::cerr << "set input data failed." << std::endl;
  return -1;
}
// Get Output
auto outputs = model_runner->GetOutputs();
for (auto &output : outputs) {
size_t size = output.DataSize();
if (size == 0 || size > MAX_MALLOC_SIZE) {
  std::cerr << "malloc size is wrong" << std::endl;
  return -1;
}
auto out_data = malloc(size);
output.SetData(out_data);
}

// Model Predict
auto predict_ret = model_runner->Predict(inputs, &outputs);
if (predict_ret != mindspore::kSuccess) {
  delete model_runner;
  std::cerr << "Predict error " << predict_ret << std::endl;
  return -1;
}

Memory Release

If the inference process of MindSpore Lite is complete, release the created ModelParallelRunner and input data.

// user need free input data and output data
for (auto &input : inputs) {
  free(input.MutableData());
  input.SetData(nullptr);
}
for (auto &output : outputs) {
  free(output.MutableData());
  output.SetData(nullptr);
}
// Delete model runner.
delete model_runner;