mindspore_lite.ModelParallelRunner

class mindspore_lite.ModelParallelRunner[source]

The ModelParallelRunner class defines a MindSpore Lite’s Runner, which support model parallelism. Compared with model , model does not support parallelism, but ModelParallelRunner supports parallelism. A Runner contains multiple workers, which are the units that actually perform parallel inferring. The primary use case is when multiple clients send inference tasks to the server, the server perform parallel inference, shorten the inference time, and then return the inference results to the clients.

Examples

>>> # Use case: serving inference.
>>> # precondition 1: Building MindSpore Lite serving package by export MSLITE_ENABLE_SERVER_INFERENCE=on.
>>> # precondition 2: install wheel package of MindSpore Lite built by precondition 1.
>>> import mindspore_lite as mslite
>>> model_parallel_runner = mslite.ModelParallelRunner()
>>> print(model_parallel_runner)
model_path: .
build_from_file(model_path, context=None)[source]

build a model parallel runner from model path so that it can run on a device.

Parameters
  • model_path (str) – Define the model path.

  • context (Context, optional) – Define the config used to transfer context and options during building model. Default: None. None means the Context with cpu target. Context has the default parallel attribute.

Raises

Examples

>>> # Use case: serving inference.
>>> # precondition 1: Building MindSpore Lite serving package by export MSLITE_ENABLE_SERVER_INFERENCE=on.
>>> # precondition 2: install wheel package of MindSpore Lite built by precondition 1.
>>> import mindspore_lite as mslite
>>> context = mslite.Context()
>>> context.target = ["cpu"]
>>> context.parallel.workers_num = 4
>>> model_parallel_runner = mslite.ModelParallelRunner()
>>> model_parallel_runner.build_from_file(model_path="mobilenetv2.mindir", context=context)
>>> print(model_parallel_runner)
model_path: mobilenetv2.mindir.
get_inputs()[source]

Obtains all input Tensors of the model.

Returns

list[Tensor], the input Tensor list of the model.

Examples

>>> # Use case: serving inference.
>>> # precondition 1: Building MindSpore Lite serving package by export MSLITE_ENABLE_SERVER_INFERENCE=on.
>>> # precondition 2: install wheel package of MindSpore Lite built by precondition 1.
>>> import mindspore_lite as mslite
>>> context = mslite.Context()
>>> context.target = ["cpu"]
>>> context.parallel.workers_num = 4
>>> model_parallel_runner = mslite.ModelParallelRunner()
>>> model_parallel_runner.build_from_file(model_path="mobilenetv2.mindir", context=context)
>>> inputs = model_parallel_runner.get_inputs()
predict(inputs, outputs=None)[source]

Inference ModelParallelRunner.

Parameters
  • inputs (list[Tensor]) – A list that includes all input Tensors in order.

  • outputs (list[Tensor], optional) – A list that includes all output Tensors in order, this tensor include output data buffer.

Returns

list[Tensor], outputs, the model outputs are filled in the container in sequence.

Raises

Examples

>>> # Use case: serving inference.
>>> # Precondition 1: Download MindSpore Lite serving package or building MindSpore Lite serving package by
>>> #                 export MSLITE_ENABLE_SERVER_INFERENCE=on.
>>> # Precondition 2: Install wheel package of MindSpore Lite built by precondition 1.
>>> # The result can be find in the tutorial of runtime_parallel_python.
>>> import time
>>> from threading import Thread
>>> import numpy as np
>>> import mindspore_lite as mslite
>>> # the number of threads of one worker.
>>> # WORKERS_NUM * THREAD_NUM should not exceed the number of cores of the machine.
>>> THREAD_NUM = 1
>>> # In parallel inference, the number of workers in one `ModelParallelRunner` in server.
>>> # If you prepare to compare the time difference between parallel inference and serial inference,
>>> # you can set WORKERS_NUM = 1 as serial inference.
>>> WORKERS_NUM = 3
>>> # Simulate 5 clients, and each client sends 2 inference tasks to the server at the same time.
>>> PARALLEL_NUM = 5
>>> TASK_NUM = 2
>>>
>>>
>>> def parallel_runner_predict(parallel_runner, parallel_id):
...     # One Runner with 3 workers, set model input, execute inference and get output.
...     task_index = 0
...     while True:
...         if task_index == TASK_NUM:
...             break
...         task_index += 1
...         # Set model input
...         inputs = parallel_runner.get_inputs()
...         in_data = np.fromfile("input.bin", dtype=np.float32)
...         inputs[0].set_data_from_numpy(in_data)
...         once_start_time = time.time()
...         # Execute inference
...         outputs = parallel_runner.predict(inputs)
...         once_end_time = time.time()
...         print("parallel id: ", parallel_id, " | task index: ", task_index, " | run once time: ",
...               once_end_time - once_start_time, " s")
...         # Get output
...         for output in outputs:
...             tensor_name = output.name.rstrip()
...             data_size = output.data_size
...             element_num = output.element_num
...             print("tensor name is:%s tensor size is:%s tensor elements num is:%s" % (tensor_name,
...                                                                                      data_size,
...                                                                                      element_num))
...
...             data = output.get_data_to_numpy()
...             data = data.flatten()
...             print("output data is:", end=" ")
...             for j in range(5):
...                 print(data[j], end=" ")
...             print("")
...
>>> # Init RunnerConfig and context, and add CPU device info
>>> context = mslite.Context()
>>> context.target = ["cpu"]
>>> context.cpu.enable_fp16 = False
>>> context.cpu.thread_num = THREAD_NUM
>>> context.cpu.inter_op_parallel_num = THREAD_NUM
>>> context.parallel.workers_num = WORKERS_NUM
>>> # Build ModelParallelRunner from file
>>> model_parallel_runner = mslite.ModelParallelRunner()
>>> model_parallel_runner.build_from_file(model_path="mobilenetv2.mindir", context=context)
>>> # The server creates 5 threads to store the inference tasks of 5 clients.
>>> threads = []
>>> total_start_time = time.time()
>>> for i in range(PARALLEL_NUM):
...     threads.append(Thread(target=parallel_runner_predict, args=(model_parallel_runner, i,)))
...
>>> # Start threads to perform parallel inference.
>>> for th in threads:
...     th.start()
...
>>> for th in threads:
...     th.join()
...
>>> total_end_time = time.time()
>>> print("total run time: ", total_end_time - total_start_time, " s")