Model Inference (Java)

Overview

After the model is converted into a .ms model by using the MindSpore Lite model conversion tool, the inference process can be performed in Runtime. For details, see Converting Models for Inference. This tutorial describes how to use the Java API to perform inference.

If MindSpore Lite is used in an Android project, you can use C++ API or Java API to run the inference framework. Compared with C++ APIs, Java APIs can be directly called in the Java class. Users do not need to implement the code at the JNI layer, which is more convenient. To run the MindSpore Lite inference framework, perform the following steps:

Load the model(optional): Read the .ms model converted by the model conversion tool introduced in Converting Models for Inference from the file system.
Create a configuration context: Create a configuration context MSContext to save some basic configuration parameters required by a model to guide graph build and execution, including deviceType (device type), threadNum (number of threads), cpuBindMode (CPU core binding mode), and enable_float16 (whether to preferentially use the float16 operator).
Build a graph: Before building a graph, the build API of Model needs to be called to build the graph, including graph partition and operator selection and scheduling. This takes a long time. Therefore, it is recommended that with Model created each time, one graph be built. In this case, the inference will be performed for multiple times.
Input data: Before the graph is performed, data needs to be filled in to the Input Tensor.
Perform inference: Use the Model of the predict to perform model inference.
Obtain the output: After the graph execution is complete, you can obtain the inference result by outputting the tensor.
Release the memory: If the MindSpore Lite inference framework is not required, release the created Model.

For details about the calling process of MindSpore Lite inference, see Experience Java Simple Inference Demo.

Referencing the MindSpore Lite Java Library

Linux X86 Project Referencing the JAR Library

When using Maven as the build tool, you can copy mindspore-lite-java.jar to the lib directory in the root directory and add the dependency of the JAR package to pom.xml.

<dependencies>
    <dependency>
        <groupId>com.mindspore.lite</groupId>
        <artifactId>mindspore-lite-java</artifactId>
        <version>1.0</version>
        <scope>system</scope>
        <systemPath>${project.basedir}/lib/mindspore-lite-java.jar</systemPath>
    </dependency>
</dependencies>

Android Projects Referencing the AAR Library

When Gradle is used as the build tool, move the mindspore-lite-{version}.aar file to the libs directory of the target module, and then add the local reference directory to repositories of build.gradle of the target module, add the AAR dependency to dependencies as follows:

Note that mindspore-lite-{version} is the AAR file name. Replace {version} with the corresponding version information.

repositories {
    flatDir {
        dirs 'libs'
    }
}

dependencies {
    implementation fileTree(dir: "libs", include: ['*.aar'])
}

Loading a Model

Before performing model inference, MindSpore Lite needs to load the .ms model converted by the model conversion tool from the file system and parse the model.

The following sample code reads the model file from specified file path.

// Load the .ms model.
MappedByteBuffer byteBuffer = null;
try {
    fc = new RandomAccessFile(fileName, "r").getChannel();
    byteBuffer = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size()).load();
} catch (IOException e) {
    e.printStackTrace();
}

Creating a Configuration Context

Create the configuration context MSContext to save some basic configuration parameters required by the session to guide graph build and execution. Configure the number of threads, thread affinity and whether to enable heterogeneous parallel inference via the init interface. MindSpore Lite has a built-in thread pool shared by processes. During inference, threadNum is used to specify the maximum number of threads in the thread pool. The default value is 2.

MindSpore Lite supports heterogeneous inference. The preferred backend for inference is specified by deviceType of AddDeviceInfo. Currently, CPU, GPU and NPU are supported. During graph build, operator selection and scheduling are performed based on the preferred backend.If the backend supports Float16, you can use the Float16 operator first by setting isEnableFloat16 to true. If it is an NPU backend, you can also set the NPU frequency value. The default frequency value is 3, and can be set to 1 (low power consumption), 2 (balanced), 3 (high performance), and 4 (extreme performance).

Configuring the CPU Backend

If the backend to be performed is a CPU, you need to configure addDeviceInfo after MSContext is inited. In addition, the CPU supports the setting of the core binding mode and whether to preferentially use the float16 operator.

The following sample code from MainActivity.java demonstrates how to create a CPU backend, set the CPU core binding mode to large-core priority, and enable float16 inference:

MSContext context = new MSContext();
context.init(2, CpuBindMode.HIGHER_CPU);
context.addDeviceInfo(DeviceType.DT_CPU, true);

Float16 takes effect only when the CPU is of the ARM v8.2 architecture. Other models and x86 platforms that are not supported are automatically rolled back to float32.

Configuring the GPU Backend

If the backend to be performed is heterogeneous inference based on CPU and GPU, you need to add successively GPUDeviceInfo and CPUDeviceInfo when call addDeviceInfo, GPU inference will be used first after configuration. In addition, if enable_float16 is set to true, both the GPU and CPU preferentially use the float16 operator.

The following sample code demonstrates how to create the CPU and GPU heterogeneous inference backend and how to enable float16 inference for the GPU.

MSContext context = new MSContext();
context.init(2, CpuBindMode.MID_CPU);
context.addDeviceInfo(DeviceType.DT_GPU, true);
context.addDeviceInfo(DeviceType.DT_CPU, true);

Currently, the GPU can run only on Android mobile devices. Therefore, only the AAR library can be run.

Configuring the NPU Backend

If the backend to be performed is heterogeneous inference based on CPU and GPU, you need to add successively KirinNPUDeviceInfo and CPUDeviceInfo when call addDeviceInfo, NPU inference will be used first after configuration. In addition, if enable_float16 is set to true, both the NPU and CPU preferentially use the float16 operator.

The following sample code demonstrates how to create the CPU and NPU heterogeneous inference backend and how to enable float16 inference for the NPU.KirinNPUDeviceInfo frequency can be set by NPUFrequency.

MSContext context = new MSContext();
context.init(2, CpuBindMode.MID_CPU);
context.addDeviceInfo(DeviceType.DT_NPU, true, 3);
context.addDeviceInfo(DeviceType.DT_CPU, true);

Loading and Compiling a Model

When using MindSpore Lite to perform inference, Model is the main entrance of inference, and the model can be realized through Model Loading, model compilation and model execution. Using the MSContext created in the previous step, call the composite build interface to implement model loading and model compilation.

The following sample code demonstrates how to load and compile a model.

Model model = new Model();
boolean ret = model.build(filePath, ModelType.MT_MINDIR, msContext);

Inputting Data

MindSpore Lite Java APIs provide the getInputsByTensorName and getInputs methods to obtain the input tensor. Both the byte[] and ByteBuffer data types are supported. You can set the data of the input tensor by calling setData.

Use the getInputsByTensorName method to obtain the tensor connected to the input node from the model input tensor based on the name of the model input tensor. The following sample code from MainActivity.java demonstrates how to call the getInputByTensorName function to obtain the input tensor and fill in data.
```
MSTensor inputTensor = model.getInputByTensorName("2031_2030_1_construct_wrapper:x");
// Set Input Data.
inputTensor.setData(inputData);
```
Use the getInputs method to directly obtain the vectors of all model input tensors. The following sample code from MainActivity.java demonstrates how to call getInputs to obtain the input tensors and fill in the data.
```
List<MSTensor> inputs = model.getInputs();
MSTensor inputTensor = inputs.get(0);
// Set Input Data.
inputTensor.setData(inputData);
```

Executing Inference

After MindSpore Lite builds a model, it can call the predict function of Model to perform model inference.

The following sample code demonstrates how to call predict to perform inference.

// Run graph to infer results.
boolean ret = model.predict();

Obtaining the Output

After performing inference, MindSpore Lite can output a tensor to obtain the inference result. MindSpore Lite provides three methods to obtain the output MSTensor of a model and supports the getByteData, getFloatData, getIntData and getLongData methods to obtain the output data.

Use the getOutputs method to directly obtain the list of all model output MSTensor. The following sample code from MainActivity.java demonstrates how to call getOutputs to obtain the output tensor.
```
List<MSTensor> outTensors = model.getOutputs();
```
Use the getOutputsByNodeName method to obtain the vector of the tensor connected to the model output MSTensor based on the name of the model output node. The following sample code from MainActivity.java demonstrates how to call getOutputByTensorName to obtain the output tensor.
```
MSTensor outTensor = model.getOutputsByNodeName("Default/head-MobileNetV2Head/Softmax-op204");
// Apply infer results.
...
```
Use the getOutputByTensorName method to obtain the model output MSTensor based on the name of the model output tensor. The following sample code from MainActivity.java demonstrates how to call getOutputByTensorName to obtain the output tensor.
```
MSTensor outTensor = model.getOutputByTensorName("Default/head-MobileNetV2Head/Softmax-op204");
// Apply infer results.
...
```

Releasing the Memory

If the MindSpore Lite inference framework is not required, you need to release the created Model. The following sample code from MainActivity.java demonstrates how to release the memory before the program ends.

model.free();

Advanced Usage

Resizing the Input Dimension

When using MindSpore Lite for inference, if you need to resize the input shape, you can call the resize API of Model to reset the shape of the input tensor after building a model.

Some networks do not support variable dimensions. As a result, an error message is displayed and the model exits unexpectedly. For example, the model contains the MatMul operator, one input tensor of the MatMul operator is the weight, and the other input tensor is the input. If a variable dimension API is called, the input tensor does not match the shape of the weight tensor. As a result, the inference fails.

The following sample code from MainActivity.java demonstrates how to perform resize on the input tensor of MindSpore Lite:

List<MSTensor> inputs = session.getInputs();
int[][] dims = {{1, 300, 300, 3}};
bool ret = model.resize(inputs, dims);

Viewing Logs

If an exception occurs during inference, you can view logs to locate the fault. For the Android platform, use the Logcat command line to view the MindSpore Lite inference log information and use MS_LITE to filter the log information.

logcat -s "MS_LITE"

Obtaining the Version Number

MindSpore Lite provides the Version method to obtain the version number, which is included in the com.mindspore.lite.Version header file. You can call this method to obtain the version number of MindSpore Lite.

The following sample code from MainActivity.java demonstrates how to obtain the version number of MindSpore Lite:

import com.mindspore.lite.config.Version;
String version = Version.version();