mindspore_serving.server

MindSpore Serving is a lightweight and high-performance service module that helps MindSpore developers efficiently deploy online inference services in the production environment.

MindSpore Serving server API, which can be used to start servables, gRPC and RESTful server. A servable corresponds to the service provided by a model. The client sends inference tasks and receives inference results through gRPC and RESTful server.

class mindspore_serving.server.SSLConfig(certificate, private_key, custom_ca=None, verify_client=False)[source]

The server’s ssl_config encapsulates necessary parameters for SSL-enabled connections.

Parameters

certificate (str) – File holding the PEM-encoded certificate chain as a byte string to use or None if no certificate chain should be used.
private_key (str) – File holding the PEM-encoded private key as a byte string, or None if no private key should be used.
custom_ca (str, optional) – File holding the PEM-encoded root certificates as a byte string. When verify_client is True, custom_ca must be provided. When verify_client is False, this parameter will be ignored. Default: None.
verify_client (bool, optional) – If verify_client is true, use mutual authentication. If false, use one-way authentication. Default: False.

Raises

RuntimeError – The type or value of the parameters are invalid.

class mindspore_serving.server.ServableStartConfig(servable_directory, servable_name, device_ids, version_number=0, device_type=None, dec_key=None, dec_mode='AES-GCM')[source]

Servable startup configuration.

For more detail, please refer to MindSpore-based Inference Service Deployment and Servable Provided Through Model Configuration.

Parameters

servable_directory (str) – The directory where the servable is located in. There expects to has a directory named servable_name.
servable_name (str) – The servable name.
device_ids (Union[int, list[int], tuple[int]]) – The device list the model loads into and runs in.
version_number (int, optional) – Servable version number to be loaded. The version number should be a positive integer, starting from 1, and 0 means to load the latest version. Default: 0.
device_type (str, optional) –
Currently supports “Ascend”, “GPU” and None. Default: None.
- ”Ascend”: the platform expected to be Ascend910 or Ascend310, etc.
- ”GPU”: the platform expected to be Nvidia GPU.
- None: the platform is determined by the MindSpore environment.
dec_key (bytes, optional) – Byte type key used for decryption. The valid length is 16, 24, or 32. Default: None.
dec_mode (str, optional) – Specifies the decryption mode, take effect when dec_key is set. Option: ‘AES-GCM’ or ‘AES-CBC’. Default: ‘AES-GCM’.

Raises

RuntimeError – The type or value of the parameters are invalid.

mindspore_serving.server.start_grpc_server(address, max_msg_mb_size=100, ssl_config=None)[source]

Start gRPC server for the communication between serving client and server.

Parameters

address (str) –
gRPC server address, the address can be {ip}:{port} or unix:{unix_domain_file_path}.
- {ip}:{port} - Internet domain socket address.
- unix:{unix_domain_file_path} - Unix domain socket address, which is used to communicate with multiple processes on the same machine. {unix_domain_file_path} can be relative or absolute file path, but the directory where the file is located must already exist.
max_msg_mb_size (int, optional) – The maximum acceptable gRPC message size in megabytes(MB), value range [1, 512]. Default: 100.
ssl_config (mindspore_serving.server.SSLConfig, optional) – The server’s ssl_config, if None, disabled ssl. Default: None.

Raises

RuntimeError – Failed to start the gRPC server: parameter verification failed, the gRPC address is wrong or the port is duplicate.

Examples

>>> from mindspore_serving import server
>>>
>>> server.start_grpc_server("0.0.0.0:5500")

mindspore_serving.server.start_restful_server(address, max_msg_mb_size=100, ssl_config=None)[source]

Start RESTful server for the communication between serving client and server.

Parameters

address (str) – RESTful server address, the address should be Internet domain socket address.
max_msg_mb_size (int, optional) – The maximum acceptable RESTful message size in megabytes(MB), value range [1, 512]. Default: 100.
ssl_config (mindspore_serving.server.SSLConfig, optional) – The server’s ssl_config, if None, disabled ssl. Default: None.

Raises

RuntimeError – Failed to start the RESTful server: parameter verification failed, the RESTful address is wrong or the port is duplicate.

Examples

>>> from mindspore_serving import server
>>>
>>> server.start_restful_server("0.0.0.0:5900")

mindspore_serving.server.start_servables(servable_configs)[source]

Start up servables.

It can be used to start multiple different servables. One servable can be deployed on multiple chips, and each chip runs a servable copy.

On Ascend 910 hardware platform, each copy of each servable owns one chip. Different servables or different versions of the same servable need to be deployed on different chips. On Ascend 310 and GPU hardware platform, one chip can be shared by multi servables, and different servables or different versions of the same servable can be deployed on the same chip to realize chip reuse.

Parameters: servable_configs (Union[ServableStartConfig, list[ServableStartConfig], tuple[ServableStartConfig]]) – The startup configs of one or more servables.
Raises: RuntimeError – Failed to start one or more servables. For log of one servable, please refer to subdirectory serving_logs.

Examples

>>> import os
>>> from mindspore_serving import server
>>>
>>> servable_dir = os.path.abspath(".")
>>> resnet_config = server.ServableStartConfig(servable_dir, "resnet", device_ids=(0,1))
>>> add_config = server.ServableStartConfig(servable_dir, "add", device_ids=(2,3))
>>> server.start_servables(servable_configs=(resnet_config, add_config))  # press Ctrl+C to stop
>>> server.start_grpc_server("0.0.0.0:5500")

mindspore_serving.server.stop()[source]

Stop the running of serving server.

Examples

>>> from mindspore_serving import server
>>>
>>> server.start_grpc_server("0.0.0.0:5500")
>>> server.start_restful_server("0.0.0.0:1500")
>>> ...
>>> server.stop()

mindspore_serving.server.register

Servable register interface, used in servable_config.py of one servable. See how to configure servable_config.py file, please refer to Servable Provided Through Model Configuration.

class mindspore_serving.server.register.AclOptions(**kwargs)[source]

Helper class to set acl options.

Parameters

insert_op_cfg_path (str, optional) – Path of aipp config file.
input_format (str, optional) – Manually specify the model input format, the value can be “ND”, “NCHW”, “NHWC”, “CHWN”, “NC1HWC0”, or “NHWC1C0”.
input_shape (str, optional) – Manually specify the model input shape, such as “input_op_name1: n1,c2,h3,w4;input_op_name2: n4,c3,h2,w1”.
output_type (str, optional) – Manually specify the model output type, the value can be “FP16”, “UINT8” or “FP32”, Default: “FP32”.
precision_mode (str, optional) – Model precision mode, the value can be “force_fp16”，”allow_fp32_to_fp16”， “must_keep_origin_dtype” or “allow_mix_precision”. Default: “force_fp16”.
op_select_impl_mode (str, optional) – The operator selection mode, the value can be “high_performance” or “high_precision”. Default: “high_performance”.

Raises

RuntimeError – Acl option is invalid, or value is not str.

Examples

>>> from mindspore_serving.server import register
>>> options = register.AclOptions(op_select_impl_mode="high_precision", precision_mode="allow_fp32_to_fp16")
>>> register.declare_servable(servable_file="deeptext.mindir", model_format="MindIR", options=options)

class mindspore_serving.server.register.GpuOptions(**kwargs)[source]

Helper class to set gpu options.

Parameters: precision_mode (str, optional) – inference operator selection, and the value can be “origin”, “fp16”. Default: “origin”.
Raises: RuntimeError – Gpu option is invalid, or value is not str.

Examples

>>> from mindspore_serving.server import register
>>> options = register.GpuOptions(precision_mode="origin")
>>> register.declare_servable(servable_file="deeptext.mindir", model_format="MindIR", options=options)

class mindspore_serving.server.register.PipelineServable(servable_name, method, version_number=0)[source]

Create Pipeline Servable for Servable calls.

Warning

This is a beta interface and may be changed in the future.

Parameters

servable_name (str) – The name of servable.
method (str) – The name of method supplied by servable.
version_number (int, optional) – The number of version supplied by servable. Default: 0.

Raises

RuntimeError – The type or value of the parameters is invalid, or other errors happened.

Examples

>>> from mindspore_serving.server import distributed
>>> from mindspore_serving.server import register
>>>
>>> distributed.declare_servable(rank_size=8, stage_size=1, with_batch_dim=False)
>>> @register.register_method(output_names=["y"])
>>> def fun(x):
...     y = register.call_servable(x)
...     return y
>>> servable = register.PipelineServable(servable_name="service", method="fun")
>>> @register.register_pipeline(output_names=["y"])
>>> def predict(x):
...     y = servable.run(x)
...     return y

run(*args)[source]

Servable calls function in Pipeline register function.

Parameters: args (numpy.ndarray) – One or more input numpy arrays.
Returns: numpy.ndarray, A numpy array object is returned if there is only one output; otherwise, a numpy array tuple is returned.
Raises: RuntimeError – The type or value of the parameters is invalid, or other errors happened.

mindspore_serving.server.register.call_postprocess(postprocess_fun, *args)[source]

For method registration, define the postprocessing function and its’ parameters.

Note

The length of ‘args’ should be equal to the inputs number of postprocess_fun.

Parameters

postprocess_fun (function) – Python function for postprocess.
args – Preprocess inputs. The length of ‘args’ should equal to the input parameters number of implemented python function.

Raises

RuntimeError – The type or value of the parameters are invalid, or other error happened.

mindspore_serving.server.register.call_postprocess_pipeline(postprocess_fun, *args)[source]

For method registration, define the postprocessing pipeline function and its’ parameters.

A single request can include multiple instances, so multiple queued requests will also have multiple instances. If you need to process multiple instances through multi thread or other parallel processing capability in preprocess or postprocess, such as using MindData concurrency ability to process multiple input images in preprocess, MindSpore Serving provides ‘call_preprocess_pipeline’ and ‘call_pstprocess_pipeline’ to register such preprocessing and postprocessing. For more detail, please refer to Resnet50 model configuration example.

Parameters

postprocess_fun (function) – Python pipeline function for postprocess.
args – Preprocess inputs. The length of ‘args’ should equal to the input parameters number of implemented python function.

Raises

RuntimeError – The type or value of the parameters are invalid, or other error happened.

mindspore_serving.server.register.call_preprocess(preprocess_fun, *args)[source]

For method registration, define the preprocessing function and its’ parameters.

Note

The length of ‘args’ should be equal to the inputs number of preprocess_fun.

Parameters

preprocess_fun (function) – Python function for preprocess.
args – Preprocess inputs. The length of ‘args’ should equal to the input parameters number of implemented python function.

Raises

RuntimeError – The type or value of the parameters are invalid, or other error happened.

Examples

>>> from mindspore_serving.server import register
>>> import numpy as np
>>> def add_trans_datatype(x1, x2):
...     return x1.astype(np.float32), x2.astype(np.float32)
>>>
>>> register.declare_servable(servable_file="tensor_add.mindir", model_format="MindIR", with_batch_dim=False)
>>>
>>> @register.register_method(output_names=["y"]) # register add_cast method in add
>>> def add_cast(x1, x2):
...     x1, x2 = register.call_preprocess(add_trans_datatype, x1, x2)  # cast input to float32
...     y = register.call_servable(x1, x2)
...     return y

mindspore_serving.server.register.call_preprocess_pipeline(preprocess_fun, *args)[source]

For method registration, define the preprocessing pipeline function and its’ parameters.

A single request can include multiple instances, so multiple queued requests will also have multiple instances. If you need to process multiple instances through multi thread or other parallel processing capability in preprocess or postprocess, such as using MindData concurrency ability to process multiple input images in preprocess, MindSpore Serving provides ‘call_preprocess_pipeline’ and ‘call_pstprocess_pipeline’ to register such preprocessing and postprocessing. For more detail, please refer to Resnet50 model configuration example.

Parameters

preprocess_fun (function) – Python pipeline function for preprocess.
args – Preprocess inputs. The length of ‘args’ should equal to the input parameters number of implemented python function.

Raises

RuntimeError – The type or value of the parameters are invalid, or other error happened.

Examples

>>> from mindspore_serving.server import register
>>> import numpy as np
>>> def add_trans_datatype(instances):
...     for instance in instances:
...         x1 = instance[0]
...         x2 = instance[0]
...         yield x1.astype(np.float32), x2.astype(np.float32)
>>>
>>> register.declare_servable(servable_file="tensor_add.mindir", model_format="MindIR", with_batch_dim=False)
>>>
>>> @register.register_method(output_names=["y"]) # register add_cast method in add
>>> def add_cast(x1, x2):
...     x1, x2 = register.call_preprocess_pipeline(add_trans_datatype, x1, x2)  # cast input to float32
...     y = register.call_servable(x1, x2)
...     return y

mindspore_serving.server.register.call_servable(*args, subgraph=0)[source]

For method registration, define the inputs data of model inference.

Note

The length of ‘args’ should be equal to the inputs number of model.

Parameters

args – Model’s inputs, the length of ‘args’ should be equal to the inputs number of model.
subgraph (int, optional) – The number of subgraph in model. Number starts at 0. Default: 0.

Raises

RuntimeError – The type or value of the parameters are invalid, or other error happened.

Examples

>>> from mindspore_serving.server import register
>>> register.declare_servable(servable_file="tensor_add.mindir", model_format="MindIR", with_batch_dim=False)
>>>
>>> @register.register_method(output_names=["y"]) # register add_common method in add
>>> def add_common(x1, x2):
...     y = register.call_servable(x1, x2)
...     return y

mindspore_serving.server.register.declare_servable(servable_file, model_format, with_batch_dim=True, options=None, without_batch_dim_inputs=None)[source]

declare the servable info.

Parameters

servable_file (Union[str, list[str]]) – Model files name.
model_format (str) – Model format, “OM” or “MindIR”, case ignored.
with_batch_dim (bool, optional) – Whether the first shape dim of the inputs and outputs of model is batch dim. Default: True.
options (Union[AclOptions, GpuOptions], optional) – Options of model, supports AclOptions or GpuOptions. Default: None.
without_batch_dim_inputs (Union[int, tuple[int], list[int]], optional) – Index of inputs that without batch dim when with_batch_dim is True. Default: None.

Raises

RuntimeError – The type or value of the parameters are invalid.

mindspore_serving.server.register.register_method(output_names)[source]

Register method for servable.

Define the data flow of preprocess, model inference and postprocess in the method. Preprocess and postprocess are optional.

Note

The method definition is only used to define data flow of preprocess, model inference and postprocess, and cannot contain branch structures such as if, for, and while.

Parameters: output_names (Union[str, tuple[str], list[str]]) – The output names of method. The input names is the args names of the registered function.
Raises: RuntimeError – The type or value of the parameters are invalid, or other error happened.

Examples

>>> from mindspore_serving.server import register
>>> import numpy as np
>>> def add_trans_datatype(x1, x2):
...      return x1.astype(np.float32), x2.astype(np.float32)
>>>
>>> register.declare_servable(servable_file="tensor_add.mindir", model_format="MindIR", with_batch_dim=False)
>>>
>>> @register.register_method(output_names=["y"]) # register add_cast method in add
>>> def add_cast(x1, x2):
...     x1, x2 = register.call_preprocess(add_trans_datatype, x1, x2)  # cast input to float32
...     y = register.call_servable(x1, x2)
...     return y

mindspore_serving.server.register.register_pipeline(output_names)[source]

register method for Pipeline Servable.

Define the data flow of Pipeline Servable Method. Pipeline servable is optional.

Warning

This is a beta interface and may be changed in the future.

Parameters: output_names (str, tuple or list of str) – The output names of pipeline. The input names is the args names of the registered function.
Raises: RuntimeError – The type or value of the parameters is invalid, or other error happened.

Examples

>>> from mindspore_serving.server import distributed
>>> from mindspore_serving.server import register
>>> from mindspore_serving.server.register import PipelineServable
>>>
>>> distributed.declare_servable(rank_size=8, stage_size=1, with_batch_dim=False)
>>> @register.register_method(output_names=["y"])
>>> def fun(x):
...     y = register.call_servable(x)
...     return y
>>> servable = PipelineServable(servable_name="service", method="fun")
>>> @register.register_pipeline(output_names=["y"])
>>> def predict(x):
...     y = servable.run(x)
...     return y

mindspore_serving.server.distributed

The interface to startup serving server with distributed servable. See how to configure and startup distributed model, please refer to MindSpore Serving-based Distributed Inference Service Deployment.

mindspore_serving.server.distributed.declare_servable(rank_size, stage_size, with_batch_dim=True, without_batch_dim_inputs=None)[source]

declare distributed servable in servable_config.py.

Parameters

rank_size (int) – Te rank size of the distributed model.
stage_size (int) – The stage size of the distributed model.
with_batch_dim (bool, optional) – Whether the first shape dim of the inputs and outputs of model is batch. Default: True.
without_batch_dim_inputs (Union[int, tuple[int], list[int]], optional) – Index of inputs that without batch dim when with_batch_dim is True. Default: None.

Raises

RuntimeError – The type or value of the parameters are invalid.

Examples

>>> from mindspore_serving.server import distributed
>>> distributed.declare_servable(rank_size=8, stage_size=1)

mindspore_serving.server.distributed.start_servable(servable_directory, servable_name, rank_table_json_file, version_number=1, distributed_address='0.0.0.0:6200', wait_agents_time_in_seconds=0)[source]

Start up the servable named ‘servable_name’ defined in ‘servable_directory’.

Parameters

servable_directory (str) – The directory where the servable is located in. There expects to has a directory named servable_name. For more detail: How to config Servable .
servable_name (str) – The servable name.
version_number (int, optional) – Servable version number to be loaded. The version number should be a positive integer, starting from 1, and 0 means to load the latest version. Default: 1.
rank_table_json_file (str) – The ranke table json file name.
distributed_address (str, optional) – The distributed worker address the worker agents linked to. Default: “0.0.0.0:6200”.
wait_agents_time_in_seconds (int, optional) – The maximum time in seconds the worker waiting ready of all agents, 0 means unlimited time. Default: 0.

Raises

RuntimeError – Failed to start the distributed servable.

Examples

>>> import os
>>> from mindspore_serving.server import distributed
>>>
>>> servable_dir = os.path.abspath(".")
>>> distributed.start_servable(servable_dir, "matmul", startup_worker_agents="hccl_8p.json", \
...                            distributed_address="127.0.0.1:6200")

mindspore_serving.server.distributed.startup_agents(distributed_address, model_files, group_config_files=None, agent_start_port=7000, agent_ip=None, rank_start=None, dec_key=None, dec_mode='AES-GCM')[source]

Start up all needed agents on current machine.

Parameters

distributed_address (str) – The distributed worker address the agents linked to.
model_files (Union[list[str], tuple[str]]) – All model files need in current machine, absolute path or path relative to this startup python script.
group_config_files (Union[list[str], tuple[str]], optional) – All group config files need in current machine, absolute path or path relative to this startup python script, default None, which means there are no configuration files. Default: None.
agent_start_port (int, optional) – The starting agent port of the agents link to worker. Default: 7000.
agent_ip (str, optional) – The local agent ip, if it’s None, the agent ip will be obtained from rank table file. Default None. Parameter agent_ip and parameter rank_start must have values at the same time, or both None at the same time. Default: None.
rank_start (int, optional) – The starting rank id of this machine, if it’s None, the rank ip will be obtained from rank table file. Default None. Parameter agent_ip and parameter rank_start must have values at the same time, or both None at the same time. Default: None.
dec_key (bytes, optional) – Byte type key used for decryption. The valid length is 16, 24, or 32. Default: None.
dec_mode (str, optional) – Specifies the decryption mode, take effect when dec_key is set. Option: ‘AES-GCM’ or ‘AES-CBC’. Default: ‘AES-GCM’.

Raises

RuntimeError – Failed to start agents.

Examples

>>> import os
>>> from mindspore_serving.server import distributed
>>> model_files = []
>>> for i in range(8):
>>>    model_files.append(f"models/device{i}/matmul.mindir")
>>> distributed.startup_agents(distributed_address="127.0.0.1:6200", model_files=model_files)