mindspore_lite.ModelParallelRunner
- class mindspore_lite.ModelParallelRunner[source]
The ModelParallelRunner class defines a MindSpore Lite’s Runner, which support model parallelism. Compared with model , model does not support parallelism, but ModelParallelRunner supports parallelism. A Runner contains multiple workers, which are the units that actually perform parallel inferring. The primary use case is when multiple clients send inference tasks to the server, the server perform parallel inference, shorten the inference time, and then return the inference results to the clients.
Note
First use the init method for initialization, and then call other methods.
Examples
>>> # Use case: serving inference. >>> # precondition 1: Building MindSpore Lite serving package by export MSLITE_ENABLE_SERVER_INFERENCE=on. >>> # precondition 2: install wheel package of MindSpore Lite built by precondition 1. >>> import mindspore_lite as mslite >>> model_parallel_runner = mslite.ModelParallelRunner() >>> print(model_parallel_runner) model_path: .
- get_inputs()[source]
Obtains all input Tensors of the model.
- Returns
list[Tensor], the input Tensor list of the model.
Examples
>>> # Use case: serving inference. >>> # precondition 1: Building MindSpore Lite serving package by export MSLITE_ENABLE_SERVER_INFERENCE=on. >>> # precondition 2: install wheel package of MindSpore Lite built by precondition 1. >>> import mindspore_lite as mslite >>> context = mslite.Context() >>> context.append_device_info(mslite.CPUDeviceInfo()) >>> runner_config = mslite.RunnerConfig(context=context, workers_num=4) >>> model_parallel_runner = mslite.ModelParallelRunner() >>> model_parallel_runner.init(model_path="mobilenetv2.ms", runner_config=runner_config) >>> inputs = model_parallel_runner.get_inputs()
- get_outputs()[source]
Obtains all output Tensors of the model.
- Returns
list[Tensor], the output Tensor list of the model.
Examples
>>> # Use case: serving inference. >>> # precondition 1: Building MindSpore Lite serving package by export MSLITE_ENABLE_SERVER_INFERENCE=on. >>> # precondition 2: install wheel package of MindSpore Lite built by precondition 1. >>> import mindspore_lite as mslite >>> context = mslite.Context() >>> context.append_device_info(mslite.CPUDeviceInfo()) >>> runner_config = mslite.RunnerConfig(context=context, workers_num=4) >>> model_parallel_runner = mslite.ModelParallelRunner() >>> model_parallel_runner.init(model_path="mobilenetv2.ms", runner_config=runner_config) >>> outputs = model_parallel_runner.get_outputs()
- init(model_path, runner_config=None)[source]
build a model parallel runner from model path so that it can run on a device.
- Parameters
model_path (str) – Define the model path.
runner_config (RunnerConfig, optional) – Define the config used to transfer context and options during model pool init. Default: None.
- Raises
TypeError – model_path is not a str.
TypeError – runner_config is neither a RunnerConfig nor None.
RuntimeError – model_path does not exist.
RuntimeError – ModelParallelRunner’s init failed.
Examples
>>> # Use case: serving inference. >>> # precondition 1: Building MindSpore Lite serving package by export MSLITE_ENABLE_SERVER_INFERENCE=on. >>> # precondition 2: install wheel package of MindSpore Lite built by precondition 1. >>> import mindspore_lite as mslite >>> context = mslite.Context() >>> context.append_device_info(mslite.CPUDeviceInfo()) >>> runner_config = mslite.RunnerConfig(context=context, workers_num=4) >>> model_parallel_runner = mslite.ModelParallelRunner() >>> model_parallel_runner.init(model_path="mobilenetv2.ms", runner_config=runner_config) >>> print(model_parallel_runner) model_path: mobilenetv2.ms.
- predict(inputs, outputs)[source]
Inference ModelParallelRunner.
- Parameters
- Raises
TypeError – inputs is not a list.
TypeError – inputs is a list, but the elements are not Tensor.
TypeError – outputs is not a list.
TypeError – outputs is a list, but the elements are not Tensor.
RuntimeError – predict model failed.
Examples
>>> # Use case: serving inference. >>> # precondition 1: Building MindSpore Lite serving package by export MSLITE_ENABLE_SERVER_INFERENCE=on. >>> # precondition 2: install wheel package of MindSpore Lite built by precondition 1. >>> import time >>> from threading import Thread >>> import numpy as np >>> import mindspore_lite as mslite >>> >>> # Precondition 1: Download MindSpore Lite serving package or building MindSpore Lite serving package by >>> # export MSLITE_ENABLE_SERVER_INFERENCE=on. >>> # Precondition 2: Install wheel package of MindSpore Lite built by precondition 1. >>> >>> # the number of threads of one worker. >>> # WORKERS_NUM * THREAD_NUM should not exceed the number of cores of the machine. >>> THREAD_NUM = 1 >>> # In parallel inference, the number of workers in one `ModelParallelRunner` in server. >>> # If you prepare to compare the time difference between parallel inference and serial inference, >>> # you can set WORKERS_NUM = 1 as serial inference. >>> WORKERS_NUM = 3 >>> # Simulate 5 clients, and each client sends 2 inference tasks to the server at the same time. >>> PARALLEL_NUM = 5 >>> TASK_NUM = 2 >>> >>> >>> def parallel_runner_predict(parallel_runner, parallel_id): ... # One Runner with 3 workers, set model input, execute inference and get output. ... task_index = 0 ... while True: ... if task_index == TASK_NUM: ... break ... task_index += 1 ... # Set model input ... inputs = parallel_runner.get_inputs() ... in_data = np.fromfile("./model/input.bin", dtype=np.float32) ... inputs[0].set_data_from_numpy(in_data) ... once_start_time = time.time() ... # Execute inference ... outputs = [] ... parallel_runner.predict(inputs, outputs) ... once_end_time = time.time() ... print("parallel id: ", parallel_id, " | task index: ", task_index, " | run once time: ", ... once_end_time - once_start_time, " s") ... # Get output ... for output in outputs: ... tensor_name = output.get_tensor_name().rstrip() ... data_size = output.get_data_size() ... element_num = output.get_element_num() ... print("tensor name is:%s tensor size is:%s tensor elements num is:%s" % (tensor_name, ... data_size, ... element_num)) ... ... data = output.get_data_to_numpy() ... data = data.flatten() ... print("output data is:", end=" ") ... for j in range(5): ... print(data[j], end=" ") ... print("") ... >>> # Init RunnerConfig and context, and add CPU device info >>> cpu_device_info = mslite.CPUDeviceInfo(enable_fp16=False) >>> context = mslite.Context(thread_num=THREAD_NUM, inter_op_parallel_num=THREAD_NUM) >>> context.append_device_info(cpu_device_info) >>> parallel_runner_config = mslite.RunnerConfig(context=context, workers_num=WORKERS_NUM) >>> # Build ModelParallelRunner from file >>> model_parallel_runner = mslite.ModelParallelRunner() >>> model_parallel_runner.init(model_path="./model/mobilenetv2.ms", runner_config=parallel_runner_config) >>> # The server creates 5 threads to store the inference tasks of 5 clients. >>> threads = [] >>> total_start_time = time.time() >>> for i in range(PARALLEL_NUM): ... threads.append(Thread(target=parallel_runner_predict, args=(model_parallel_runner, i,))) ... >>> # Start threads to perform parallel inference. >>> for th in threads: ... th.start() ... >>> for th in threads: ... th.join() ... >>> total_end_time = time.time() >>> print("total run time: ", total_end_time - total_start_time, " s")