使用Delegate支持第三方AI框架接入(端上)
概述
MindSpore Lite的Delegate接口用于支持第三方AI框架(例如:NPU、TensorRT)能快速接入Lite的推理流程。第三方框架可以是用户自己实现,也可以是业内其他开源的框架,一般都具备在线构图的能力,即可以将多个算子构建成一张子图发放给设备执行。如果用户想通过MindSpore Lite框架调度到其他框架的推理流程,可参考本文。
Delegate使用
使用Delegate接入第三方AI框架执行推理主要包含以下步骤:
新增自定义Delegate类:继承Delegate类实现自定义的Delegate。
实现初始化接口:Init接口实现判断运行设备是否支持Delegate框架,初始化Delegate资源等功能。
实现构图接口:Build接口要实现算子支持判断、子图构建、在线构图功能。
实现子图Kernel:继承Kernel实现Delegate的子图Kernel。
新增自定义Delegate类
自定义Delegate要继承自Delegate类。可以在构造函数中完成对第三方框架调度硬件设备有关config的初始化,如NPU指定频率、CPU指定线程数等。
class XXXDelegate : public Delegate {
public:
XXXDelegate() = default;
~XXXDelegate() = default;
Status Init() = 0;
Status Build(DelegateModel *model) = 0;
}
实现初始化接口
Init接口会在Model的Build流程中被调用。具体的调用位置在MindSpore Lite内部代码LiteSession::Init函数中。
Status XXXDelegate::Init() {
// 1. Check whether the inference device matches the delegate framework.
// 2. Initialize delegate related resources.
}
实现构图接口
构图接口Build(DelegateModel *model)接口的入参是DelegateModel的实例。
DelegateModel
中,std::vector<kernel::Kernel *> *kernels_是已经完成MindSpore Lite内置算子注册、经过拓扑排序的算子列表。const std::map<kernel::Kernel *, const schema::Primitive *> primitives_保存了每个算子对应的属性值
schema::Primitive
,用于解析每个算子的原始属性信息。
Build会在Model的Build接口被调用。具体的位置在MindSpore Lite内部代码Schedule::Schedule函数中,此时已完成内置算子选择,算子存放在DelegateModel的Kernel列表中。Build需要实现以下功能:
遍历Kernel列表,调用GetPrimitive获取每个算子对应的属性值,解析该算子的属性值,判断Delegate框架是否支持。
对连续可支持的一段算子列表,构建一张Delegate子图,调用Replace用子图Kernel去替换这段连续的算子。
Status XXXDelegate::Build(DelegateModel *model) {
KernelIter from = model->BeginKernelIterator(); // Record the start operator position supported by the Delegate
KernelIter end = model->BeginKernelIterator(); // Record the end operator position supported by the Delegate
for (KernelIter iter = model->BeginKernelIterator(); iter != model->EndKernelIterator(); iter++) {
kernel::Kernel *kernel = *iter;
if (IsSupport(kernel, model->GetPrimitive(kernel))) { // Check whether the Delegate framework supports the kernel according to the primitive
end = iter;
} else { // The current kernel is not supported, and the sub-graph is truncated
if (from != end) {
auto xxx_graph_kernel = CreateXXXGraph(from, end, model); // Create a Delegate sub-graph Kernel
iter = model->Replace(from, end + 1, xxx_graph_kernel); // Replace the supported kernels list with a Delegate sub-graph Kernel
}
from = iter + 1;
end = iter + 1;
}
}
return RET_OK;
}
实现子图Kernel
上述CreateXXXGraph
接口要返回一张Delegate的子图,示例代码如下所示:
kernel::Kernel *XXXDelegate::CreateXXXGraph(KernelIter from, KernelIter end, DelegateModel *model) {
auto in_tensors = GraphInTensors(...); // Find the input tensors of the Delegate sub-graph
auto out_tensors = GraphOutTensors(...); // Find the output tensors of the Delegate sub-graph
auto graph_kernel = new (std::nothrow) XXXGraph(in_tensors, out_tensors);
if (graph_kernel == nullptr) {
MS_LOG(ERROR) << "New XXX Graph failed.";
return nullptr;
}
// Build graph online, load model, etc.
return graph_kernel;
}
Delegate子图XXXGraph
的定义要继承自Kernel,如下代码所示。对这张子图,要注意:
要根据原始的Kernel列表找到正确的in_tensors和out_tensors,以便Execute时,能找到正确的输入tensor和输入数据,并将输出数据写回到正确的地址中。
重写对应的Prepare、Resize、Execute接口。其中,Prepare会在Model的Build阶段调用。Execute会在Model的Predict阶段被调用。ReSize会在Model的Resize阶段被调用。
class XXXGraph : public kernel::Kernel {
public:
XXXGraph(const std::vector<tensor::MSTensor *> &inputs, const std::vector<tensor::MSTensor *> &outputs)
: kernel::Kernel(inputs, outputs, nullptr, nullptr) {}
~XXXGraph() override;
int Prepare() override {
// Generally, the model will be built only once, so Prepare is also called once.
// Do something without input data, such as pack the constant weight tensor, etc.
}
int Execute() override {
// Obtain input data from in_tensors.
// Execute the inference process.
// Write the result back to out_tensors.
}
int ReSize() override {
// Support dynamic shape, and input shape will changed.
}
};
Lite框架调度
Lite框架要调度用户自定义的Delegate,在创建Context时,需要通过SetDelegate设置自定义Delegate指针,见以下示例代码。再通过Build传递给Lite框架。如果Context中的Delegate为空指针,推理流程会调用到Lite框架内置的推理。
auto context = std::make_shared<mindspore::Context>();
if (context == nullptr) {
MS_LOG(ERROR) << "New context failed";
return RET_ERROR;
}
auto delegate = std::make_shared<XXXDelegate>();
if (delegate == nullptr) {
MS_LOG(ERROR) << "New XXX delegate failed";
return RET_ERROR;
}
context->SetDelegate(delegate);
auto model = new (std::nothrow) mindspore::Model();
if (model == nullptr) {
std::cerr << "New Model failed." << std::endl;
}
// Assuming that we have read a ms file and stored in the address pointed by model_buf
auto build_ret = model->Build(model_buf, size, mindspore::kMindIR, context);
delete[](model_buf);
if (build_ret != mindspore::kSuccess) {
std::cerr << "Build model failed." << std::endl;
}
NPUDelegate示例
目前,MindSpore Lite对于NPU后端的集成采用了NPUDelegate接口。本教程对NPUDelegate做简单说明,使用户能快速了解Delegate相关API的使用。
新增NPUDelegate类
class NPUDelegate : public Delegate {
public:
explicit NPUDelegate(lite::NpuDeviceInfo device_info) : Delegate() { frequency_ = device_info.frequency_; }
~NPUDelegate() override;
Status Init() override;
Status Build(DelegateModel *model) override;
protected:
// Analyze a kernel and its attribute.
// If NPU supports it, return an NPUOp, which has the information of connection relationship with other kernels and the attributes.
// If not support, return null pointer.
NPUOp *GetOP(kernel::Kernel *kernel, const schema::Primitive *primitive);
// Construct a NPU sub-graph with a continuous NPUOps
kernel::Kernel *CreateNPUGraph(const std::vector<NPUOp *> &ops, DelegateModel *model, KernelIter from,
KernelIter end);
NPUManager *npu_manager_ = nullptr;
NPUPassManager *pass_manager_ = nullptr;
std::map<schema::PrimitiveType, NPUGetOp> op_func_lists_;
int frequency_ = 0; // NPU frequency
};
实现Init接口
Init接口实现和NPU有关的资源申请。
Status NPUDelegate::Init() {
npu_manager_ = new (std::nothrow) NPUManager(); // NPU manager of model buffer and client.
if (npu_manager_ == nullptr) {
MS_LOG(ERROR) << "New npu manager failed.";
return RET_ERROR;
}
if (!npu_manager_->IsSupportNPU()) { // Check whether the current device supports NPU.
MS_LOG(DEBUG) << "Checking npu is unsupported.";
return RET_NOT_SUPPORT;
}
pass_manager_ = new (std::nothrow) NPUPassManager(); // The default format of MindSpore Lite is NHWC, and the default format of NPU is NCHW. The NPUPassManager is used to pack data between the sub-graphs.
if (pass_manager_ == nullptr) {
MS_LOG(ERROR) << "New npu pass manager failed.";
return RET_ERROR;
}
// Initialize op_func lists. Get the correspondence between kernel type and GetOP function.
op_func_lists_.clear();
return RET_OK;
}
实现Build接口
Build接口解析DelegateModel实例,主要实现算子支持判断、子图构建、在线构图等功能。下面示例代码是NPUDelegate Build接口的实现。
Status NPUDelegate::Build(DelegateModel *model) {
KernelIter from, end; // Record the start and end positions of kernel supported by the NPU sub-graph.
std::vector<NPUOp *> npu_ops; // Save all NPUOp used to construct an NPU sub-graph.
int graph_index = 0;
for (KernelIter iter = model->BeginKernelIterator(); iter != model->EndKernelIterator(); iter++) {
kernel::Kernel *kernel = *iter;
auto npu_op = GetOP(kernel, model->GetPrimitive(kernel)); // Obtain an NPUOp according to the kernel and the primitive. Each NPUOp contains information such as input tensors, output tensors and operator attribute.
if (npu_op != nullptr) { // NPU supports the current kernel.
if (npu_ops.size() == 0) {
from = iter;
}
npu_ops.push_back(npu_op);
end = iter;
} else { // NPU does not support the current kernel.
if (npu_ops.size() > 0) {
auto npu_graph_kernel = CreateNPUGraph(npu_ops); // Create a NPU sub-graph kernel.
if (npu_graph_kernel == nullptr) {
MS_LOG(ERROR) << "Create NPU Graph failed.";
return RET_ERROR;
}
npu_graph_kernel->set_name("NpuGraph" + std::to_string(graph_index++));
iter = model->Replace(from, end + 1, npu_graph_kernel); // Replace the supported kernel list with a NPU sub-graph kernel.
npu_ops.clear();
}
}
}
auto ret = npu_manager_->LoadOMModel(); // Build model online. Load NPU model.
if (ret != RET_OK) {
MS_LOG(ERROR) << "NPU client load model failed.";
return RET_ERROR;
}
return RET_OK;
}
实现构图代码
以下示例代码是NPUDelegate的CreateNPUGraph接口,用于生成一张NPU子图。
kernel::Kernel *NPUDelegate::CreateNPUGraph(const std::vector<NPUOp *> &ops) {
auto in_tensors = GraphInTensors(ops);
auto out_tensors = GraphOutTensors(ops);
auto graph_kernel = new (std::nothrow) NPUGraph(ops, npu_manager_, in_tensors, out_tensors);
if (graph_kernel == nullptr) {
MS_LOG(DEBUG) << "New NPU Graph failed.";
return nullptr;
}
ret = graph_kernel->Init();
if (ret != RET_OK) {
MS_LOG(DEBUG) << "NPU Graph Init failed.";
return nullptr;
}
return graph_kernel;
}
实现NPUGraph
NPUGraph继承自Kernel,需要重写Prepare、Execute、ReSize接口。
class NPUGraph : public kernel::Kernel {
public:
NPUGraph(std::vector<NPUOp *> npu_ops, NPUManager *npu_manager, const std::vector<tensor::MSTensor *> &inputs,
const std::vector<tensor::MSTensor *> &outputs)
: kernel::Kernel(inputs, outputs, nullptr, nullptr), npu_ops_(std::move(npu_ops)), npu_manager_(npu_manager) {}
~NPUGraph() override;
int Prepare() override;
int Execute() override;
int ReSize() override { // NPU does not support dynamic shapes.
MS_LOG(ERROR) << "NPU does not support the resize function temporarily.";
return lite::RET_ERROR;
}
protected:
std::vector<NPUOp *> npu_ops_{};
NPUManager *npu_manager_ = nullptr;
NPUExecutor *executor_ = nullptr; // NPU inference executor.
};
NPUGraph::Prepare接口主要实现:
int NPUGraph::Prepare() {
// Find the mapping relationship between hiai::AiTensor defined by NPU and MSTensor defined by MindSpore Lite
}
NPUGraph::Execute接口主要实现:
int NPUGraph::Execute() {
// 1. Processing input: copy input data from MSTensor to hiai::AiTensor
// 2. Perform inference
executor_->Execute();
// 3. Processing output: copy output data from hiai::AiTensor to MSTensor
}
NPU是MindSpore Lite开发人员对接的第三方AI框架,使用方法和用户自定义的Delegate略有不同,既可以通过SetDelegate设置Context,也可以设置Context的MutableDeviceInfo,增加NPU设备的描述KirinNPUDeviceInfo。