mindspore_lite.ModelParallelRunner

class mindspore_lite.ModelParallelRunner[source]

The ModelParallelRunner class defines a MindSpore Lite’s Runner, which support model parallelism. Compared with model , model does not support parallelism, but ModelParallelRunner supports parallelism. A Runner contains multiple workers, which are the units that actually perform parallel inferring. The primary use case is when multiple clients send inference tasks to the server, the server perform parallel inference, shorten the inference time, and then return the inference results to the clients.

Note

First use the init method for initialization, and then call other methods.

Examples

>>> # Use case: serving inference.
>>> # precondition 1: Building MindSpore Lite serving package by export MSLITE_ENABLE_SERVER_INFERENCE=on.
>>> # precondition 2: install wheel package of MindSpore Lite built by precondition 1.
>>> import mindspore_lite as mslite
>>> model_parallel_runner = mslite.ModelParallelRunner()
>>> print(model_parallel_runner)
model_path: .
get_inputs()[source]

Obtains all input Tensors of the model.

Returns

list[Tensor], the input Tensor list of the model.

Examples

>>> # Use case: serving inference.
>>> # precondition 1: Building MindSpore Lite serving package by export MSLITE_ENABLE_SERVER_INFERENCE=on.
>>> # precondition 2: install wheel package of MindSpore Lite built by precondition 1.
>>> import mindspore_lite as mslite
>>> context = mslite.Context()
>>> context.append_device_info(mslite.CPUDeviceInfo())
>>> runner_config = mslite.RunnerConfig(context=context, workers_num=4)
>>> model_parallel_runner = mslite.ModelParallelRunner()
>>> model_parallel_runner.init(model_path="mobilenetv2.ms", runner_config=runner_config)
>>> inputs = model_parallel_runner.get_inputs()
get_outputs()[source]

Obtains all output Tensors of the model.

Returns

list[Tensor], the output Tensor list of the model.

Examples

>>> # Use case: serving inference.
>>> # precondition 1: Building MindSpore Lite serving package by export MSLITE_ENABLE_SERVER_INFERENCE=on.
>>> # precondition 2: install wheel package of MindSpore Lite built by precondition 1.
>>> import mindspore_lite as mslite
>>> context = mslite.Context()
>>> context.append_device_info(mslite.CPUDeviceInfo())
>>> runner_config = mslite.RunnerConfig(context=context, workers_num=4)
>>> model_parallel_runner = mslite.ModelParallelRunner()
>>> model_parallel_runner.init(model_path="mobilenetv2.ms", runner_config=runner_config)
>>> outputs = model_parallel_runner.get_outputs()
init(model_path, runner_config=None)[source]

build a model parallel runner from model path so that it can run on a device.

Parameters
  • model_path (str) – Define the model path.

  • runner_config (RunnerConfig, optional) – Define the config used to transfer context and options during model pool init. Default: None.

Raises

Examples

>>> # Use case: serving inference.
>>> # precondition 1: Building MindSpore Lite serving package by export MSLITE_ENABLE_SERVER_INFERENCE=on.
>>> # precondition 2: install wheel package of MindSpore Lite built by precondition 1.
>>> import mindspore_lite as mslite
>>> context = mslite.Context()
>>> context.append_device_info(mslite.CPUDeviceInfo())
>>> runner_config = mslite.RunnerConfig(context=context, workers_num=4)
>>> model_parallel_runner = mslite.ModelParallelRunner()
>>> model_parallel_runner.init(model_path="mobilenetv2.ms", runner_config=runner_config)
>>> print(model_parallel_runner)
model_path: mobilenetv2.ms.
predict(inputs, outputs)[source]

Inference ModelParallelRunner.

Parameters
  • inputs (list[Tensor]) – A list that includes all input Tensors in order.

  • outputs (list[Tensor]) – The model outputs are filled in the container in sequence.

Raises
  • TypeErrorinputs is not a list.

  • TypeErrorinputs is a list, but the elements are not Tensor.

  • TypeErroroutputs is not a list.

  • TypeErroroutputs is a list, but the elements are not Tensor.

  • RuntimeError – predict model failed.

Examples

>>> # Use case: serving inference.
>>> # precondition 1: Building MindSpore Lite serving package by export MSLITE_ENABLE_SERVER_INFERENCE=on.
>>> # precondition 2: install wheel package of MindSpore Lite built by precondition 1.
>>> import time
>>> from threading import Thread
>>> import numpy as np
>>> import mindspore_lite as mslite
>>>
>>> # Precondition 1: Download MindSpore Lite serving package or building MindSpore Lite serving package by
>>> #                 export MSLITE_ENABLE_SERVER_INFERENCE=on.
>>> # Precondition 2: Install wheel package of MindSpore Lite built by precondition 1.
>>>
>>> # the number of threads of one worker.
>>> # WORKERS_NUM * THREAD_NUM should not exceed the number of cores of the machine.
>>> THREAD_NUM = 1
>>> # In parallel inference, the number of workers in one `ModelParallelRunner` in server.
>>> # If you prepare to compare the time difference between parallel inference and serial inference,
>>> # you can set WORKERS_NUM = 1 as serial inference.
>>> WORKERS_NUM = 3
>>> # Simulate 5 clients, and each client sends 2 inference tasks to the server at the same time.
>>> PARALLEL_NUM = 5
>>> TASK_NUM = 2
>>>
>>>
>>> def parallel_runner_predict(parallel_runner, parallel_id):
...     # One Runner with 3 workers, set model input, execute inference and get output.
...     task_index = 0
...     while True:
...         if task_index == TASK_NUM:
...             break
...         task_index += 1
...         # Set model input
...         inputs = parallel_runner.get_inputs()
...         in_data = np.fromfile("./model/input.bin", dtype=np.float32)
...         inputs[0].set_data_from_numpy(in_data)
...         once_start_time = time.time()
...         # Execute inference
...         outputs = []
...         parallel_runner.predict(inputs, outputs)
...         once_end_time = time.time()
...         print("parallel id: ", parallel_id, " | task index: ", task_index, " | run once time: ",
...               once_end_time - once_start_time, " s")
...         # Get output
...         for output in outputs:
...             tensor_name = output.get_tensor_name().rstrip()
...             data_size = output.get_data_size()
...             element_num = output.get_element_num()
...             print("tensor name is:%s tensor size is:%s tensor elements num is:%s" % (tensor_name,
...                                                                                      data_size,
...                                                                                      element_num))
...
...             data = output.get_data_to_numpy()
...             data = data.flatten()
...             print("output data is:", end=" ")
...             for j in range(5):
...                 print(data[j], end=" ")
...             print("")
...
>>> # Init RunnerConfig and context, and add CPU device info
>>> cpu_device_info = mslite.CPUDeviceInfo(enable_fp16=False)
>>> context = mslite.Context(thread_num=THREAD_NUM, inter_op_parallel_num=THREAD_NUM)
>>> context.append_device_info(cpu_device_info)
>>> parallel_runner_config = mslite.RunnerConfig(context=context, workers_num=WORKERS_NUM)
>>> # Build ModelParallelRunner from file
>>> model_parallel_runner = mslite.ModelParallelRunner()
>>> model_parallel_runner.init(model_path="./model/mobilenetv2.ms", runner_config=parallel_runner_config)
>>> # The server creates 5 threads to store the inference tasks of 5 clients.
>>> threads = []
>>> total_start_time = time.time()
>>> for i in range(PARALLEL_NUM):
...     threads.append(Thread(target=parallel_runner_predict, args=(model_parallel_runner, i,)))
...
>>> # Start threads to perform parallel inference.
>>> for th in threads:
...     th.start()
...
>>> for th in threads:
...     th.join()
...
>>> total_end_time = time.time()
>>> print("total run time: ", total_end_time - total_start_time, " s")