在MCU或小型系统上执行推理或训练

查看源文件

概述

本教程介绍面向IoT边缘侧设备的超轻量AI部署方案。

相较于移动设备,IoT设备上通常使用MicroControllerUnits(MCUs),不仅设备系统ROM资源非常有限,而且硬件资源内存和算力都非常弱小。 因此IOT设备上的AI应用对AI模型推理的运行时内存和功耗都有严格限制。 MindSpore Lite针对MCUs部署硬件后端,提供了一种超轻量Micro AI部署解决方案:离线阶段直接将模型生成轻量化代码,不再需要在线解析模型和图编译,生成的Micro代码非常直观易懂,运行时内存小,代码体积也更小。 用户使用MindSpore Lite转换工具converter_lite非常容易生成可在x86/ARM64/ARM32/Cortex-M平台部署的推理或训练代码。

通过Micro部署一个模型进行推理或训练,通常包含以下四步:模型代码生成、Micro库获取、代码集成、编译部署。

模型推理代码生成

概述

通过MindSpore Lite转换工具converter_lite,并在转换工具的参数配置文件中,配置Micro配置项,就能为输入模型生成推理代码。 此章只介绍转换工具中生成代码的相关功能,关于转换工具的基本使用方法,请参考推理模型转换

环境准备

以Linux环境下使用转换工具为例,需要进行如下环境准备。

  1. 转换工具运行所需的系统环境

    本例采用Linux下的系统环境,推荐使用Ubuntu 18.04.02LTS。

  2. 获取转换工具

    可以通过两种方式获取转换工具:

    • MindSpore官网下载Release版本

      用户需下载操作系统为Linux-x86_64,硬件平台为CPU的发布包。

    • 从源码开始编译构建

  3. 解压下载的包

    tar -zxf mindspore-lite-${version}-linux-x64.tar.gz
    

    ${version}是发布包的版本号。

  4. 将转换工具运行时需要的动态链接库加入环境变量LD_LIBRARY_PATH

    export LD_LIBRARY_PATH=${PACKAGE_ROOT_PATH}/tools/converter/lib:${LD_LIBRARY_PATH}
    

    ${PACKAGE_ROOT_PATH}是解压得到的文件夹路径。

单模型生成推理代码

  1. 进入转换目录

    cd ${PACKAGE_ROOT_PATH}/tools/converter/converter
    
  2. 设置Micro配置项

    在当前目录下新建micro.cfg文件,文件内容如下:

    [micro_param]
    
    # enable code-generation for MCU HW
    
    enable_micro=true
    
    # specify HW target, support x86,Cortex-M, AMR32A, ARM64 only.
    
    target=x86
    
    # enable parallel inference or not.
    
    support_parallel=false
    

    配置文件中,第一行的[micro_param]表明后续的变量参数属于Micro配置项micro_param,这些参数用于控制代码生成,各参数含义如下表1所示。 本例中,我们将生成适用底层架构为x86_64的Linux系统上的单模型推理代码,故设置target=x86以声明生成的推理代码将用于底层架构为x86_64的Linux系统。

  3. 准备要生成推理代码的模型

    用户可点击此处下载本例中用到的MNIST手写数字识别模型。 下载后,解压包,得到mnist.tflite,该模型为已经训练完的MNIST分类模型,为TFLITE模型。将mnist.tflite模型拷贝到当前所在的转换工具目录。

  4. 执行converter_lite,生成代码

    ./converter_lite --fmk=TFLITE --modelFile=mnist.tflite --outputFile=mnist --configFile=micro.cfg
    

    运行成功后的结果显示为:

    CONVERT RESULT SUCCESS:0
    

    用户若想了解converter_lite转换工具的相关参数,可参考converter参数说明

    在转换工具执行成功后,生成的代码被保存在用户指定的outputFile路径下,在本例中,为当前转换目录下的mnist文件夹,内容如下:

    mnist                          # 指定的生成代码根目录名称
    ├── benchmark                  # 对模型推理代码进行集成调用的benchmark例程
    │   ├── benchmark.c
    │   ├── calib_output.c
    │   ├── calib_output.h
    │   ├── load_input.c
    │   └── load_input.h
    ├── CMakeLists.txt             # benchmark例程的cmake工程文件
    └── src                        # 模型推理代码目录
        ├── model0                 # 与模型相关的文件目录
           ├── model0.c
           ├── net0.bin            # 二进制形式的模型权重
           ├── net0.c
           ├── net0.h
           ├── weight0.c
           ├── weight0.h
        ├── CMakeLists.txt
        ├── allocator.c
        ├── allocator.h
        ├── net.cmake
        ├── model.c
        ├── model.h
        ├── context.c
        ├── context.h
        ├── tensor.c
        ├── tensor.h
    

    生成代码中的src目录即为模型推理代码所在目录,benchmark只是对src目录代码进行集成调用的一个例程。 关于集成调用的更多详细说明,请参照代码集成及编译部署章节。

表1:micro_param参数定义

参数

是否必选

参数说明

取值范围

默认值

enable_micro

模型会生成代码,否则生成.ms

true, false

false

target

生成代码针对的平台

x86, Cortex-M, ARM32, ARM64

x86

support_parallel

是否生成多线程推理代码,仅在x86、ARM32、ARM64平台可设置为true

true, false

false

save_path

否(多模型参数)

多模型生成代码文件路径

project_name

否(多模型参数)

多模型生成代码工程名

多模型生成推理代码

  1. 进入转换目录

    cd ${PACKAGE_ROOT_PATH}/tools/converter/converter
    
  2. 设置Micro配置项

    在当前目录下新建micro.cfg文件,文件内容如下:

    [micro_param]
    
    # enable code-generation for MCU HW
    
    enable_micro=true
    
    # specify HW target, support x86,Cortex-M, AMR32A, ARM64 only.
    
    target=x86
    
    # enable parallel inference or not.
    
    support_parallel=false
    
    # save generated code path.
    
    save_path=workpath/
    
    # set project name.
    
    project_name=minst
    
    [model_param]
    
    # input model type.
    
    fmk=TFLITE
    
    # path of input model file.
    
    modelFile=mnist.tflite
    
    [model_param]
    
    # input model type.
    
    fmk=TFLITE
    
    # path of input model file.
    
    modelFile=mnist.tflite
    

    配置文件中,[micro_param]表明后续的变量参数属于Micro配置项micro_param,这些参数用于控制代码生成,各参数含义如表1所示。[model_param]表明后续的变量参数属于对应Model配置项model_param,这些参数用于控制不同模型的转换,参数的范围包括converter_lite支持的必要参数。 本例中,我们将生成适用底层架构为x86_64的Linux系统上的多模型推理代码,故设置target=x86以声明生成的推理代码将用于底层架构为x86_64的Linux系统。

  3. 准备要生成推理代码的模型

    用户可点击此处下载本例中用到的MNIST手写数字识别模型。 下载后,解压包,得到mnist.tflite,该模型为已经训练完的MNIST分类模型,为TFLITE模型。将mnist.tflite模型拷贝到当前所在的转换工具目录。

  4. 执行converter_lite,只需要配置config文件,生成代码

    ./converter_lite --configFile=micro.cfg
    

    运行成功后的结果显示为:

    CONVERT RESULT SUCCESS:0
    

    用户若想了解converter_lite转换工具的相关参数,可参考converter参数说明

    在转换工具执行成功后,生成的代码被保存在用户指定的save_path+project_name路径下,在本例中,为当前转换目录下的mnist文件夹,内容如下:

    mnist                          # 指定的生成代码根目录(工程)名称
    ├── benchmark                  # 对模型推理代码进行集成调用的benchmark例程
    │   ├── benchmark.c
    │   ├── calib_output.c
    │   ├── calib_output.h
    │   ├── load_input.c
    │   └── load_input.h
    ├── CMakeLists.txt             # benchmark例程的cmake工程文件
    ├── include
        ├── model_handle.h         # 模型对外接口文件
    └── src                        # 模型推理代码目录
        ├── model0                 # 第一个模型相关的文件目录
           ├── model0.c
           ├── net0.bin            # 二进制形式的模型权重
           ├── net0.c
           ├── net0.h
           ├── weight0.c
           ├── weight0.h
        ├── model1                 # 第二个模型相关的文件目录
           ├── model1.c
           ├── net1.bin
           ├── net1.c
           ├── net1.h
           ├── weight1.c
           ├── weight1.h
        ├── CMakeLists.txt
        ├── allocator.c
        ├── allocator.h
        ├── net.cmake
        ├── model.c
        ├── model.h
        ├── context.c
        ├── context.h
        ├── tensor.c
        ├── tensor.h
    

    生成代码中的src目录即为模型推理代码所在目录,benchmark只是对src目录代码进行集成调用的一个例程,在多模型场景下,用户需根据自己的需求对benchmark进行微调。 关于集成调用的更多详细说明,请参照代码集成及编译部署章节。

模型输入shape配置(可选)

通常在生成代码时,通过配置模型输入shape为实际推理时的输入shape,可以减少部署过程中出错的概率。 当模型含有Shape算子或者原模型输入shape非固定值时,必须配置模型的输入shape值,以支持相关shape优化和代码生成。 通过转换工具的--inputShape=命令可以配置生成代码的输入shape,具体参数含义,请参考转换工具使用说明

生成多线程并行推理代码(可选)

在通常的Linux-x86/Android环境下,拥有多核CPU,使能Micro多线程推理能够发挥设备性能,加快模型推理速度。

配置文件

通过在配置文件中设置support_parallel为true,将生成支持多线程推理的代码,关于配置文件各选项含义请参考表1。 一个 x86 的多线程代码生成配置文件的示例如下:

[micro_param]

# enable code-generation for MCU HW

enable_micro=true

# specify HW target, support x86,Cortex-M, AMR32A, ARM64 only.

target=x86

# enable parallel inference or not.

support_parallel=true

涉及的调用接口

通过集成代码,并调用下述接口,用户可以配置模型的多线程推理,具体接口参数请参考API文档

表2:多线程配置API接口

功能

函数原型

设置推理时线程数

void MSContextSetThreadNum(MSContextHandle context, int32_t thread_num)

设置线程绑核模式

void MSContextSetThreadAffinityMode(MSContextHandle context, int mode)

获取推理时线程数

int32_t MSContextGetThreadNum(const MSContextHandle context);

获取线程绑核模式

int MSContextGetThreadAffinityMode(const MSContextHandle context)

集成说明

生成多线程代码后,用户需链接pthread标准库,以及Micro库内的libwrapper.a静态库。具体可参考生成代码中的CMakeLists.txt文件。

限制说明

目前该功能仅在 target 配置为x86/ARM32/ARM64时使能,最大可设置推理线程数为4线程。

生成Int8量化推理代码(可选)

在Cortex-M等MCU场景下,受限于设备的内存大小及算力,通常需要使用Int8量化算子来进行部署推理以减少运行时内存大小并加速运算。

如果用户已经有一个Int8全量化模型,可参考执行converter_lite生成推理代码章节尝试直接生成Int8量化推理代码而不需要阅读本章内容。 在通常的情况下,用户只有一个训练好的Float32模型,此时若要生成Int8量化推理代码,则需配合转换工具的后量化功能进行代码生成,具体步骤可参考下文。

配置文件

通过在配置文件中配置量化控制参数可以实现Int8量化推理代码生成,关于量化控制参数(通用量化参数common_quant_param和全量化参数full_quant_param)的说明,请参考转换工具的训练后量化文档

一个 Cortex-M 平台的Int8量化推理代码生成配置文件的示例如下:

[micro_param]
# enable code-generation for MCU HW
enable_micro=true

# specify HW target, support x86,Cortex-M, ARM32, ARM64 only.
target=Cortex-M

# code generation for Inference or Train
codegen_mode=Inference

# enable parallel inference or not
support_parallel=false

[common_quant_param]
# Supports WEIGHT_QUANT or FULL_QUANT
quant_type=FULL_QUANT

# Weight quantization support the number of bits [0,16], Set to 0 is mixed bit quantization, otherwise it is fixed bit quantization
# Full quantization support the number of bits [1,8]
bit_num=8

[data_preprocess_param]

calibrate_path=inputs:/home/input_dir

calibrate_size=100

input_type=BIN

[full_quant_param]

activation_quant_method=MAX_MIN

bias_correction=true

target_device=DSP
限制说明
  • 目前仅支持全量化的推理代码生成。

  • 配置文件中全量化参数full_quant_param的target_device通常需设置为DSP,以支持更多的算子进行后量化。

  • 目前Micro已支持34个Int8量化算子,如果在生成代码时,有相关量化算子不支持,可通过通用量化参数common_quant_paramskip_quant_node来规避该算子,被规避的算子节点仍然采用Float32推理。

模型训练代码生成

概述

通过MindSpore Lite转换工具converter_lite,并在转换工具的参数配置文件中,配置Micro配置项,就能为输入模型生成训练代码。 此章只介绍转换工具中生成代码的相关功能,关于转换工具的基本使用方法,请参考训练模型转换

环境准备

环境准备小节参考上文,此处不再赘述。

执行converter_lite生成推理代码

  1. 进入转换目录

    cd ${PACKAGE_ROOT_PATH}/tools/converter/converter
    
  2. 设置Micro配置项

    在当前目录下新建micro.cfg文件,文件内容如下:

    [micro_param]
    
    # enable code-generation for MCU HW
    
    enable_micro=true
    
    # specify HW target, support x86,Cortex-M, AMR32A, ARM64 only.
    
    target=x86
    
    # code generation for Inference or Train. Cortex-M is unsupported when codegen_mode is Train.
    
    codegen_mode=Train
    
  3. 执行converter_lite,生成代码

    ./converter_lite --fmk=MINDIR --trainModel=True --modelFile=my_model.mindir --outputFile=my_model --configFile=micro.cfg
    

    运行成功后的结果显示为:

    CONVERT RESULT SUCCESS:0
    

    在转换工具执行成功后,生成的代码被保存在用户指定的outputFile路径下,在本例中,为当前转换目录下的my_model文件夹,内容如下:

    my_model                       # 指定的生成代码根目录名称
    ├── benchmark                  # 对模型训练代码进行集成调用的benchmark例程
    │   ├── benchmark.c
    │   ├── calib_output.c
    │   ├── calib_output.h
    │   ├── load_input.c
    │   └── load_input.h
    ├── CMakeLists.txt             # benchmark例程的cmake工程文件
    └── src                        # 模型推理代码目录
        ├── CMakeLists.txt
        ├── net.bin                # 二进制形式的模型权重
        ├── net.c
        ├── net.cmake
        ├── net.h
        ├── model.c
        ├── context.c
        ├── context.h
        ├── tensor.c
        ├── tensor.h
        ├── weight.c
        └── weight.h
    

    训练执行流程涉及的API请参考训练接口介绍

Micro库获取

在生成模型推理代码之后,用户在对代码进行集成开发之前,需要获得生成的推理代码所依赖的Micro库。

不同平台的推理代码依赖对应平台的Micro库,用户需根据使用的平台,在生成代码时,通过Micro配置项target指定该平台,并在获取Micro库时,获得该平台的Micro库。 用户可通过MindSpore官网下载对应平台的Release版本

模型推理代码生成章节,我们得到了x86_64架构Linux平台的模型推理代码,而该代码所依赖的Micro库,就在转换工具所使用的发布包内。 发布包内,推理代码所依赖的库和头文件如下:

mindspore-lite-{version}-linux-x64
├── runtime
│   └── include
│       └── c_api            # MindSpore Lite集成开发的C API头文件
└── tools
    └── codegen # 代码生成的source code 依赖include和lib
        ├── include          # 推理框架头文件
        │   ├── nnacl        # nnacl 算子头文件
        │   └── wrapper      # wrapper 算子头文件
        ├── lib
        │   ├── libwrapper.a # MindSpore Lite codegen生成代码依赖的部分算子静态库
        │   └── libnnacl.a   # MindSpore Lite codegen生成代码依赖的nnacl算子静态库
        └── third_party
            ├── include
            │   └── CMSIS    # ARM CMSIS NN 算子头文件
            └── lib
                └── libcmsis_nn.a # ARM CMSIS NN 算子静态库

代码集成及编译部署

在生成代码的benchmark目录中,包含了对推理代码的接口调用示例,用户可参考benchmark例程,来对src推理代码进行集成开发以实现自身的应用。

推理代码的调用接口

以下是推理代码的一般调用接口,关于接口的详细说明,请参考API文档

表3:推理通用API接口

功能

函数原型

创建 Model

MSModelHandle MSModelCreate()

销毁 Model

void MSModelDestroy(MSModelHandle *model)

计算 Model 运行时所需的缓存大小(仅支持Cortex-M平台)

size_t MSModelCalcWorkspaceSize(MSModelHandle model)

设置 Model 运行时的缓存(仅支持Cortex-M平台)

void MSModelSetWorkspace(MSModelHandle model, void *workspace, size_t workspace_size)

编译 Model

MSStatus MSModelBuild(MSModelHandle model, const void *model_data, size_t data_size, MSModelType model_type, const MSContextHandle model_context)

推理 Model

MSStatus MSModelPredict(MSModelHandle model, const MSTensorHandleArray inputs, MSTensorHandleArray *outputs, const MSKernelCallBackC before, const MSKernelCallBackC after)

获取所有输入 Tensor

MSTensorHandleArray MSModelGetInputs(const MSModelHandle model)

获取所有输出 Tensor

MSTensorHandleArray MSModelGetOutputs(const MSModelHandle model)

通过名字取输入 Tensor

MSTensorHandle MSModelGetInputByTensorName(const MSModelHandle model, const char *tensor_name)

通过名字取输出 Tensor

MSTensorHandle MSModelGetOutputByTensorName(const MSModelHandle model, const char *tensor_name)

训练代码的调用接口

以下是训练代码的一般调用接口。

表4:训练通用API接口(此处只列举训练相关接口)

功能

函数原型

单步执行 Model

MSStatus MSModelRunStep(MSModelHandle model, const MSKernelCallBackC before, const MSKernelCallBackC after)

设置执行模式 Model

MSStatus MSModelSetTrainMode(MSModelHandle model, bool train)

权重导出 Model

MSStatus MSModelExportWeight(MSModelHandle model,const char *export_path)

不同的平台的集成差异

不同的平台在代码集成和编译部署上会有不同的差异。

多模型推理集成

多模型集成与单模型的类似。唯有一点不同:单模型场景下,用户可通过MSModelCreate接口创建模型。而在多模型场景下,为用户提供了MSModelHandle句柄,用户可通过操纵不同模型的MSModelHandle句柄,调用单模型通用的推理API接口,实现对不同模型的集成,MSModelHandle句柄可参考多模型文件目录下的model_handle.h文件。

在MCU上执行推理

概述

本教程以MNIST模型在STM32F767芯片的部署为例,演示如何在Cortex-M架构的MCU上部署推理模型,主要包括以下几步:

  • 通过converter_lite转换工具,生成适配Cortex-M架构的模型推理代码

  • 下载得到该Cortex-M架构对应的Micro

  • 对得到的推理代码和Micro库进行集成,编译并部署验证

    在Windows平台,我们演示了如何通过IAR进行推理代码的集成开发,在Linux平台上,我们演示了如何通过MakeFile交叉编译的方式进行代码集成开发。

生成MCU推理代码

为MCU生成推理代码,请参考模型推理代码生成章节,只需将Micro配置项中的target=x86改为target=Cortex-M,就可以为MCU生成推理代码。 生成成功之后,文件夹内容如下所示:

mnist                          # 指定的生成代码根目录名称
├── benchmark                  # 对模型推理代码进行集成调用的benchmark例程
│   ├── benchmark.c
│   ├── calib_output.c
│   ├── calib_output.h
│   ├── data.c
│   ├── data.h
│   ├── load_input.c
│   └── load_input.h
├── build.sh                   # 一键编译脚本
├── CMakeLists.txt             # benchmark例程的cmake工程文件
├── cortex-m7.toolchain.cmake  # cortex-m7的交叉编译cmake文件
└── src                        # 模型推理代码目录
    ├── CMakeLists.txt
    ├── context.c
    ├── context.h
    ├── model.c
    ├── net.c
    ├── net.cmake
    ├── net.h
    ├── tensor.c
    ├── tensor.h
    ├── weight.c
    └── weight.h

下载Cortex-M架构Micro

STM32F767芯片为Cortex-M7架构,可以通过以下两种方式获取该架构的Micro库:

  • MindSpore官网下载Release版本

    用户需下载操作系统为None,硬件平台为Cortex-M7的发布包。

  • 从源码开始编译构建

    用户可通过MSLITE_MICRO_PLATFORM=cortex-m7 bash build.sh -I x86_64命令,来编译得到Cortex-M7的发布包。

对于暂未提供发布包进行下载的其他Cortex-M架构平台,用户可参考从源码编译构建的方式,修改MindSpore源码,进行手动编译,得到发布包。

在Windows上的代码集成及编译部署:通过IAR进行集成开发

本例通过IAR进行代码集成及烧录,演示如何在Windows上对生成的推理代码进行集成开发。主要分为以下几步:

  • 下载所需要的相关软件,做好集成的环境准备

  • 通过STM32CubeMX软件生成所需要的MCU启动代码及演示工程

  • IAR内集成模型推理代码及Micro

  • 编译并仿真运行

环境准备

  • STM32CubeMX Windows版本 >= 6.0.1

    • STM32CubeMX意法半导体提供的STM32芯片图形化配置工具,该工具用于生成STM芯片的启动代码及工程。

  • IAR EWARM >= 9.1

    • IAR EWARM是一款IARSystems公司为ARM微处理器开发的一个集成开发环境。

获取MCU启动代码及工程

如果用户已经有自己的MCU工程,请忽略该章节。 本章,以生成STM32F767芯片的启动工程为例,演示如何通过STM32CubeMX生成STM32芯片的MCU工程。

  • 启动STM32CubeMX,在File选项中选择New Project来新建工程。

  • MCU/MPU Selector窗口,搜索并选择STM32F767IGT6,点击Start Project创建该芯片的工程。

  • Project Manager界面,配置工程名及生成的工程路径,在Toolchain / IDE选项选择EWARM,以指定生成IAR工程。

  • 点击上方的GENERATE CODE生成代码。

  • 在已安装IAR的PC机上,双击生成工程内EWARM目录下的Project.eww即可打开该IAR工程。

集成模型推理代码及Micro

  • 将生成的推理代码拷贝到工程内,并将下载Cortex-M架构Micro章节获得的压缩包解压后放到生成的推理代码目录内,目录如下图所示:

    test_stm32f767                                   # MCU工程目录
    ├── Core
    │   ├── Inc
    │   └── Src
    │       ├── main.c
    │       └── ...
    ├── Drivers
    ├── EWARM                                        # IAR工程文件目录
    └── mnist                                        # 生成代码根目录
        ├── benchmark                                # 对模型推理代码进行集成调用的benchmark例程
        │   ├── benchmark.c
        │   ├── data.c
        │   ├── data.h
        │   └── ...
        │── mindspore-lite-1.8.0-none-cortex-m7      # 下载的Cortex-M7架构`Micro`库
        ├── src                                      # 模型推理代码目录
        └── ...
    
  • 向IAR工程导入源文件

    打开IAR工程,在Workspace界面,右键该项目,选择Add -> Add Group,添加一个mnist分组,右键点击该分组,重复新建分组操作,新建srcbenchmark分组。 在各自分组下,选择Add -> Add Files,将mnist文件夹内srcbenchmark目录下的源文件引入各自分组。

  • 加入依赖的头文件路径和静态库

    Workspace界面,右键该项目,选择Options,打开项目选项窗口。在项目选项窗口左侧选择C/C++ Compiler选项,在右侧的子窗口中,选择Preprocessor子界面,将推理代码依赖的头文件路径加入到列表中。本例中添加的头文件路径如下:

    $PROJ_DIR$/../mnist/mindspore-lite-1.8.0-none-cortex-m7/runtime
    $PROJ_DIR$/../mnist/mindspore-lite-1.8.0-none-cortex-m7/runtime/include
    $PROJ_DIR$/../mnist/mindspore-lite-1.8.0-none-cortex-m7/tools/codegen/include
    $PROJ_DIR$/../mnist/mindspore-lite-1.8.0-none-cortex-m7/tools/codegen/third_party/include/CMSIS/Core
    $PROJ_DIR$/../mnist/mindspore-lite-1.8.0-none-cortex-m7/tools/codegen/third_party/include/CMSIS/DSP
    $PROJ_DIR$/../mnist/mindspore-lite-1.8.0-none-cortex-m7/tools/codegen/third_party/include/CMSIS/NN
    $PROJ_DIR$/../mnist
    

    在项目选项窗口左侧选择Linker选项,在右侧的子窗口中,选择Library子界面,将推理代码依赖的算子静态库文件加入到列表中。本例中添加的静态库文件如下:

    $PROJ_DIR$/../mnist/mindspore-lite-1.8.0-none-cortex-m7/tools/codegen/lib/libwrapper.a
    $PROJ_DIR$/../mnist/mindspore-lite-1.8.0-none-cortex-m7/tools/codegen/lib/libnnacl.a
    $PROJ_DIR$/../mnist/mindspore-lite-1.8.0-none-cortex-m7/tools/codegen/third_party/lib/libcmsis_nn.a  
    
  • 修改main.c文件,调用benchmark函数

    main.c开头增加头文件引用,并在main函数中调用benchmark.c中的benchmark函数,benchmark文件夹内的程序为对生成的src内的推理代码进行推理调用并比较输出的示范样例程序,用户可以自由对它进行修改。

    #include "benchmark/benchmark.h"
    ...
    int main(void)
    {
      ...
      if (benchmark() == 0) {
          printf("\nrun success.\n");
      } else {
          printf("\nrun failed.\n");
      }
      ...
    }
    
  • 修改mnist/benchmark/data.c文件,将标杆输入输出数据存放在程序内以进行对比

    在benchmark例程内,会设置模型的输入数据,并将推理结果和设定的期望结果进行对比,得到误差偏移值。 在本例中,通过修改data.ccalib_input0_data数组,设置模型的输入数据,通过修改calib_output0_data,设定期望结果。

    float calib_input0_data[NET_INPUT0_SIZE] = {0.54881352186203,0.7151893377304077,0.6027633547782898,0.5448831915855408,0.42365479469299316,0.6458941102027893,0.4375872015953064,0.891772985458374,0.9636627435684204,0.3834415078163147,0.7917250394821167,0.5288949012756348,0.5680445432662964,0.9255966544151306,0.07103605568408966,0.08712930232286453,0.020218396559357643,0.832619845867157,0.7781567573547363,0.8700121641159058,0.978618323802948,0.7991585731506348,0.4614793658256531,0.7805292010307312,0.11827442795038223,0.6399210095405579,0.14335328340530396,0.9446688890457153,0.5218483209609985,0.4146619439125061,0.26455560326576233,0.7742336988449097,0.4561503231525421,0.568433940410614,0.018789799883961678,0.6176354885101318,0.6120957136154175,0.6169340014457703,0.9437480568885803,0.681820273399353,0.35950788855552673,0.43703195452690125,0.6976311802864075,0.0602254718542099,0.6667667031288147,0.670637845993042,0.21038256585597992,0.12892629206180573,0.31542834639549255,0.36371076107025146,0.5701967477798462,0.4386015236377716,0.9883738160133362,0.10204481333494186,0.20887675881385803,0.16130951046943665,0.6531082987785339,0.25329160690307617,0.4663107693195343,0.24442559480667114,0.15896958112716675,0.11037514358758926,0.6563295722007751,0.13818295300006866,0.1965823620557785,0.3687251806259155,0.8209932446479797,0.09710127860307693,0.8379449248313904,0.0960984081029892,0.9764594435691833,0.4686512053012848,0.9767611026763916,0.6048455238342285,0.7392635941505432,0.03918779268860817,0.28280696272850037,0.12019655853509903,0.296140193939209,0.11872772127389908,0.3179831802845001,0.414262980222702,0.06414749473333359,0.6924721002578735,0.5666014552116394,0.26538950204849243,0.5232480764389038,0.09394051134586334,0.5759465098381042,0.9292961955070496,0.3185689449310303,0.6674103736877441,0.13179786503314972,0.7163271903991699,0.28940609097480774,0.18319135904312134,0.5865129232406616,0.02010754682123661,0.8289400339126587,0.004695476032793522,0.6778165102005005,0.2700079679489136,0.7351940274238586,0.9621885418891907,0.2487531453371048,0.5761573314666748,0.5920419096946716,0.5722519159317017,0.22308163344860077,0.9527490139007568,0.4471253752708435,0.8464086651802063,0.6994792819023132,0.2974369525909424,0.8137978315353394,0.396505743265152,0.8811032176017761,0.5812729001045227,0.8817353844642639,0.6925315856933594,0.7252542972564697,0.5013243556022644,0.9560836553573608,0.6439902186393738,0.4238550364971161,0.6063932180404663,0.019193198531866074,0.30157482624053955,0.6601735353469849,0.2900775969028473,0.6180154085159302,0.42876869440078735,0.1354740709066391,0.29828232526779175,0.5699648857116699,0.5908727645874023,0.5743252635002136,0.6532008051872253,0.6521032452583313,0.43141844868659973,0.8965466022491455,0.36756187677383423,0.4358649253845215,0.8919233679771423,0.806194007396698,0.7038885951042175,0.10022688657045364,0.9194825887680054,0.7142413258552551,0.9988470077514648,0.14944830536842346,0.8681260347366333,0.16249293088912964,0.6155595779418945,0.1238199844956398,0.8480082154273987,0.8073189854621887,0.5691007375717163,0.40718328952789307,0.06916699558496475,0.6974287629127502,0.45354267954826355,0.7220556139945984,0.8663823008537292,0.9755215048789978,0.855803370475769,0.011714084073901176,0.359978049993515,0.729990541934967,0.17162968218326569,0.5210366249084473,0.054337989538908005,0.19999653100967407,0.01852179504930973,0.793697714805603,0.2239246815443039,0.3453516662120819,0.9280812740325928,0.704414427280426,0.031838931143283844,0.1646941602230072,0.6214783787727356,0.5772286057472229,0.23789282143115997,0.9342139959335327,0.6139659285545349,0.5356327891349792,0.5899099707603455,0.7301220297813416,0.31194499135017395,0.39822107553482056,0.20984375476837158,0.18619300425052643,0.9443724155426025,0.739550769329071,0.49045881628990173,0.22741462290287018,0.2543564736843109,0.058029159903526306,0.43441662192344666,0.3117958903312683,0.6963434815406799,0.37775182723999023,0.1796036809682846,0.024678727611899376,0.06724963337182999,0.6793927550315857,0.4536968469619751,0.5365791916847229,0.8966712951660156,0.990338921546936,0.21689698100090027,0.6630781888961792,0.2633223831653595,0.02065099962055683,0.7583786249160767,0.32001715898513794,0.38346388936042786,0.5883170962333679,0.8310484290122986,0.6289818286895752,0.872650682926178,0.27354204654693604,0.7980468273162842,0.18563593924045563,0.9527916312217712,0.6874882578849792,0.21550767123699188,0.9473705887794495,0.7308558225631714,0.2539416551589966,0.21331197023391724,0.518200695514679,0.02566271834075451,0.20747007429599762,0.4246854782104492,0.3741699755191803,0.46357542276382446,0.27762871980667114,0.5867843627929688,0.8638556003570557,0.11753185838460922,0.517379105091095,0.13206811249256134,0.7168596982955933,0.39605969190597534,0.5654212832450867,0.1832798421382904,0.14484776556491852,0.4880562722682953,0.35561272501945496,0.9404319524765015,0.7653252482414246,0.748663604259491,0.9037197232246399,0.08342243731021881,0.5521924495697021,0.5844760537147522,0.961936354637146,0.29214751720428467,0.24082878232002258,0.10029394179582596,0.016429629176855087,0.9295293092727661,0.669916570186615,0.7851529121398926,0.28173011541366577,0.5864101648330688,0.06395526975393295,0.48562759160995483,0.9774951338768005,0.8765052556991577,0.3381589651107788,0.961570143699646,0.23170162737369537,0.9493188261985779,0.9413776993751526,0.799202561378479,0.6304479241371155,0.8742879629135132,0.2930202782154083,0.8489435315132141,0.6178767085075378,0.013236857950687408,0.34723350405693054,0.14814086258411407,0.9818294048309326,0.4783703088760376,0.49739137291908264,0.6394725441932678,0.36858460307121277,0.13690027594566345,0.8221177458763123,0.1898479163646698,0.5113189816474915,0.2243170291185379,0.09784448146820068,0.8621914982795715,0.9729194641113281,0.9608346819877625,0.9065554738044739,0.774047315120697,0.3331451416015625,0.08110138773918152,0.40724116563796997,0.2322341352701187,0.13248763978481293,0.053427182137966156,0.7255943417549133,0.011427458375692368,0.7705807685852051,0.14694663882255554,0.07952208071947098,0.08960303664207458,0.6720477938652039,0.24536721408367157,0.4205394685268402,0.557368814945221,0.8605511784553528,0.7270442843437195,0.2703278958797455,0.131482794880867,0.05537432059645653,0.3015986382961273,0.2621181607246399,0.45614057779312134,0.6832813620567322,0.6956254243850708,0.28351885080337524,0.3799269497394562,0.18115095794200897,0.7885454893112183,0.05684807524085045,0.6969972252845764,0.7786954045295715,0.7774075865745544,0.25942257046699524,0.3738131523132324,0.5875996351242065,0.27282190322875977,0.3708527982234955,0.19705428183078766,0.4598558843135834,0.044612299650907516,0.7997958660125732,0.07695644348859787,0.5188351273536682,0.3068101108074188,0.5775429606437683,0.9594333171844482,0.6455702185630798,0.03536243736743927,0.4304024279117584,0.5100168585777283,0.5361775159835815,0.6813924908638,0.2775960862636566,0.12886056303977966,0.3926756680011749,0.9564056992530823,0.1871308982372284,0.9039839506149292,0.5438059568405151,0.4569114148616791,0.8820413947105408,0.45860394835472107,0.7241676449775696,0.3990253210067749,0.9040443897247314,0.6900250315666199,0.6996220350265503,0.32772040367126465,0.7567786574363708,0.6360610723495483,0.2400202751159668,0.16053882241249084,0.796391487121582,0.9591665863990784,0.4581388235092163,0.5909841656684875,0.8577226400375366,0.45722344517707825,0.9518744945526123,0.5757511854171753,0.8207671046257019,0.9088436961174011,0.8155238032341003,0.15941447019577026,0.6288984417915344,0.39843425154685974,0.06271295249462128,0.4240322411060333,0.25868406891822815,0.849038302898407,0.03330462798476219,0.9589827060699463,0.35536885261535645,0.3567068874835968,0.01632850244641304,0.18523232638835907,0.40125951170921326,0.9292914271354675,0.0996149331331253,0.9453015327453613,0.869488537311554,0.4541623890399933,0.326700896024704,0.23274412751197815,0.6144647002220154,0.03307459130883217,0.015606064349412918,0.428795725107193,0.06807407736778259,0.2519409954547882,0.2211609184741974,0.253191202878952,0.13105523586273193,0.012036222964525223,0.11548429727554321,0.6184802651405334,0.9742562174797058,0.9903450012207031,0.40905410051345825,0.1629544198513031,0.6387617588043213,0.4903053343296051,0.9894098043441772,0.06530420482158661,0.7832344174385071,0.28839850425720215,0.24141861498355865,0.6625045537948608,0.24606318771839142,0.6658591032028198,0.5173085331916809,0.4240889847278595,0.5546877980232239,0.2870515286922455,0.7065746784210205,0.414856880903244,0.3605455458164215,0.8286569118499756,0.9249669313430786,0.04600730910897255,0.2326269894838333,0.34851935505867004,0.8149664998054504,0.9854914546012878,0.9689717292785645,0.904948353767395,0.2965562641620636,0.9920112490653992,0.24942004680633545,0.10590615123510361,0.9509525895118713,0.2334202527999878,0.6897682547569275,0.05835635960102081,0.7307090759277344,0.8817201852798462,0.27243688702583313,0.3790569007396698,0.3742961883544922,0.7487882375717163,0.2378072440624237,0.17185309529304504,0.4492916464805603,0.30446839332580566,0.8391891121864319,0.23774182796478271,0.5023894309997559,0.9425836205482483,0.6339976787567139,0.8672894239425659,0.940209686756134,0.7507648468017578,0.6995750665664673,0.9679655432701111,0.9944007992744446,0.4518216848373413,0.07086978107690811,0.29279401898384094,0.15235470235347748,0.41748636960983276,0.13128933310508728,0.6041178107261658,0.38280805945396423,0.8953858613967896,0.96779465675354,0.5468848943710327,0.2748235762119293,0.5922304391860962,0.8967611789703369,0.40673333406448364,0.5520782470703125,0.2716527581214905,0.4554441571235657,0.4017135500907898,0.24841345846652985,0.5058664083480835,0.31038081645965576,0.37303486466407776,0.5249704718589783,0.7505950331687927,0.3335074782371521,0.9241587519645691,0.8623185753822327,0.048690296709537506,0.2536425292491913,0.4461355209350586,0.10462789237499237,0.34847599267959595,0.7400975227355957,0.6805144548416138,0.6223844289779663,0.7105283737182617,0.20492368936538696,0.3416981101036072,0.676242470741272,0.879234790802002,0.5436780452728271,0.2826996445655823,0.030235258862376213,0.7103368043899536,0.007884103804826736,0.37267908453941345,0.5305371880531311,0.922111451625824,0.08949454873800278,0.40594232082366943,0.024313200265169144,0.3426109850406647,0.6222310662269592,0.2790679335594177,0.2097499519586563,0.11570323258638382,0.5771402716636658,0.6952700018882751,0.6719571352005005,0.9488610029220581,0.002703213831409812,0.6471966505050659,0.60039222240448,0.5887396335601807,0.9627703428268433,0.016871673986315727,0.6964824199676514,0.8136786222457886,0.5098071694374084,0.33396488428115845,0.7908401489257812,0.09724292904138565,0.44203564524650574,0.5199523568153381,0.6939564347267151,0.09088572859764099,0.2277594953775406,0.4103015661239624,0.6232946515083313,0.8869608044624329,0.618826150894165,0.13346147537231445,0.9805801510810852,0.8717857599258423,0.5027207732200623,0.9223479628562927,0.5413808226585388,0.9233060479164124,0.8298973441123962,0.968286395072937,0.919782817363739,0.03603381663560867,0.1747720092535019,0.3891346752643585,0.9521427154541016,0.300028920173645,0.16046763956546783,0.8863046765327454,0.4463944137096405,0.9078755974769592,0.16023047268390656,0.6611174941062927,0.4402637481689453,0.07648676633834839,0.6964631676673889,0.2473987489938736,0.03961552307009697,0.05994429811835289,0.06107853725552559,0.9077329635620117,0.7398838996887207,0.8980623483657837,0.6725823283195496,0.5289399027824402,0.30444636940956116,0.997962236404419,0.36218905448913574,0.47064894437789917,0.37824517488479614,0.979526937007904,0.1746583878993988,0.32798799872398376,0.6803486943244934,0.06320761889219284,0.60724937915802,0.47764649987220764,0.2839999794960022,0.2384132742881775,0.5145127177238464,0.36792758107185364,0.4565199017524719,0.3374773859977722,0.9704936742782593,0.13343943655490875,0.09680395573377609,0.3433917164802551,0.5910269021987915,0.6591764688491821,0.3972567617893219,0.9992780089378357,0.35189300775527954,0.7214066386222839,0.6375827193260193,0.8130538463592529,0.9762256741523743,0.8897936344146729,0.7645619511604309,0.6982485055923462,0.335498183965683,0.14768557250499725,0.06263600289821625,0.2419017106294632,0.432281494140625,0.521996259689331,0.7730835676193237,0.9587409496307373,0.1173204779624939,0.10700414329767227,0.5896947383880615,0.7453980445861816,0.848150372505188,0.9358320832252502,0.9834262132644653,0.39980170130729675,0.3803351819515228,0.14780867099761963,0.6849344372749329,0.6567619442939758,0.8620625734329224,0.09725799411535263,0.49777689576148987,0.5810819268226624,0.2415570467710495,0.16902540624141693,0.8595808148384094,0.05853492394089699,0.47062090039253235,0.11583399772644043,0.45705875754356384,0.9799623489379883,0.4237063527107239,0.857124924659729,0.11731556057929993,0.2712520658969879,0.40379273891448975,0.39981213212013245,0.6713835000991821,0.3447181284427643,0.713766872882843,0.6391869187355042,0.399161159992218,0.43176013231277466,0.614527702331543,0.0700421929359436,0.8224067091941833,0.65342116355896,0.7263424396514893,0.5369229912757874,0.11047711223363876,0.4050356149673462,0.40537357330322266,0.3210429847240448,0.029950324445962906,0.73725426197052,0.10978446155786514,0.6063081622123718,0.7032175064086914,0.6347863078117371,0.95914226770401,0.10329815745353699,0.8671671748161316,0.02919023483991623,0.534916877746582,0.4042436182498932,0.5241838693618774,0.36509987711906433,0.19056691229343414,0.01912289671599865,0.5181497931480408,0.8427768349647522,0.3732159435749054,0.2228638231754303,0.080532006919384,0.0853109210729599,0.22139644622802734,0.10001406073570251,0.26503971219062805,0.06614946573972702,0.06560486555099487,0.8562761545181274,0.1621202677488327,0.5596824288368225,0.7734555602073669,0.4564095735549927,0.15336887538433075,0.19959613680839539,0.43298420310020447,0.52823406457901,0.3494403064250946,0.7814795970916748,0.7510216236114502,0.9272118210792542,0.028952548280358315,0.8956912755966187,0.39256879687309265,0.8783724904060364,0.690784752368927,0.987348735332489,0.7592824697494507,0.3645446300506592,0.5010631680488586,0.37638914585113525,0.364911824464798,0.2609044909477234,0.49597030878067017,0.6817399263381958,0.27734026312828064,0.5243797898292542,0.117380291223526,0.1598452925682068,0.04680635407567024,0.9707314372062683,0.0038603513967245817,0.17857997119426727,0.6128667593002319,0.08136960119009018,0.8818964958190918,0.7196201682090759,0.9663899540901184,0.5076355338096619,0.3004036843776703,0.549500584602356,0.9308187365531921,0.5207614302635193,0.2672070264816284,0.8773987889289856,0.3719187378883362,0.0013833499979227781,0.2476850152015686,0.31823351979255676,0.8587774634361267,0.4585031569004059,0.4445872902870178,0.33610227704048157,0.880678117275238,0.9450267553329468,0.9918903112411499,0.3767412602901459,0.9661474227905273,0.7918795943260193,0.675689160823822,0.24488948285579681,0.21645726263523102,0.1660478264093399,0.9227566123008728,0.2940766513347626,0.4530942440032959,0.49395784735679626,0.7781715989112854,0.8442349433898926,0.1390727013349533,0.4269043505191803,0.842854917049408,0.8180332779884338};
    float calib_output0_data[NET_OUTPUT0_SIZE] = {3.5647096e-05,6.824297e-08,0.009327697,3.2340475e-05,1.1117579e-05,1.5117058e-06,4.6314454e-07,5.161628e-11,0.9905911,3.8835238e-10};
    

编译并仿真运行

在本例中,采用软件仿真的方式对推理结果进行查看分析。 在Workspace界面,右键该项目,选择Options,打开项目选项窗口。在项目选项窗口左侧选择Debugger选项,在右侧的Setup子窗口中,Driver选择为Simulator,使能软件仿真。

关闭项目选项窗口,点击菜单栏Project -> Download and Debug,进行项目编译并仿真。通过在benchmark调用处加入断点,可以观察到仿真运行的推理结果,以及benchmark()函数的返回值。

在Linux上的代码集成及编译部署:通过MakeFile进行代码集成

本章教程以在Linux平台上,通过MakeFile对生成的模型代码进行集成开发为例,演示了在Linux上进行MCU推理代码集成开发的一般步骤。 主要分为以下几步:

  • 下载所需要的相关软件,准备好交叉编译及烧录环境

  • 通过STM32CubeMX软件生成所需要的MCU启动代码及演示工程

  • 修改MakeFile集成模型推理代码及Micro

  • 编译工程及烧录

  • 读取板端运行结果并验证

本例构建完成的完整demo代码,可点击此处下载

环境准备

  • CMake >= 3.18.3

  • GNU Arm Embedded Toolchain >= 10-2020-q4-major-x86_64-linux

    • 该工具为适用Cortex-M的Linux下交叉编译工具。

    • 下载x86_64-linux版本的gcc-arm-none-eabi,解压缩后,将目录下的bin路径加入到PATH环境变量中:export PATH=gcc-arm-none-eabi路径/bin:$PATH

  • STM32CubeMX-Lin >= 6.5.0

    • STM32CubeMX意法半导体提供的STM32芯片图形化配置工具,该工具用于生成STM芯片的启动代码及工程。

  • STM32CubePrg-Lin >= 6.5.0

    • 该工具是意法半导体提供的烧录工具,可用于程序烧录和数据读取。

获取MCU启动代码及工程

如果用户已经有自己的MCU工程,请忽略该章节。 本章,以生成STM32F767芯片的启动工程为例,演示如何通过STM32CubeMX生成STM32芯片的MCU工程。

  • 启动STM32CubeMX,在File选项中选择New Project来新建工程。

  • MCU/MPU Selector窗口,搜索并选择STM32F767IGT6,点击Start Project创建该芯片的工程。

  • Project Manager界面,配置工程名及生成的工程路径,在Toolchain / IDE选项选择Makefile,以👈指定生成MakeFile工程。

  • 点击上方的GENERATE CODE生成代码

  • 在生成的工程目录下执行make,测试代码是否成功编译。

集成模型推理代码

  • 将生成的推理代码拷贝到MCU工程内,并将下载Cortex-M架构Micro章节获得的压缩包解压后放到生成的推理代码目录内,目录如下图所示:

    stm32f767                                       # MCU工程目录
    ├── Core
    │   ├── Inc
    │   └── Src
    │       ├── main.c
    │       └── ...
    ├── Drivers
    ├── mnist                                        # 生成代码根目录
    │   ├── benchmark                                # 对模型推理代码进行集成调用的benchmark例程
    │   │   ├── benchmark.c
    │   │   ├── data.c
    │   │   ├── data.h
    │   │   └── ...
    │   │── mindspore-lite-1.8.0-none-cortex-m7      # 下载的Cortex-M7架构`Micro`库
    │   ├── src                                      # 模型推理代码目录
    │   └── ...
    ├── Makefile
    ├── startup_stm32f767xx.s
    └── STM32F767IGTx_FLASH.ld
    
  • 修改MakeFile,将模型推理代码及依赖库加入工程

    本例中要加入工程的源代码包括src下的模型推理代码和benchmark目录下的模型推理调用示例代码, 修改MakeFile中的C_SOURCES变量定义,加入源文件路径:

    C_SOURCES =  \
    mnist/src/context.c \
    mnist/src/model.c \
    mnist/src/net.c \
    mnist/src/tensor.c \
    mnist/src/weight.c \
    mnist/benchmark/benchmark.c \
    mnist/benchmark/calib_output.c \
    mnist/benchmark/load_input.c \
    mnist/benchmark/data.c \
    ...
    

    加入依赖的头文件路径:修改MakeFile中的C_INCLUDES变量定义,加入以下路径:

    LITE_PACK = mindspore-lite-1.8.0-none-cortex-m7
    
    C_INCLUDES =  \
    -Imnist/$(LITE_PACK)/runtime \
    -Imnist/$(LITE_PACK)/runtime/include \
    -Imnist/$(LITE_PACK)/tools/codegen/include \
    -Imnist/$(LITE_PACK)/tools/codegen/third_party/include/CMSIS/Core \
    -Imnist/$(LITE_PACK)/tools/codegen/third_party/include/CMSIS/DSP \
    -Imnist/$(LITE_PACK)/tools/codegen/third_party/include/CMSIS/NN \
    -Imnist \
    ...
    

    加入依赖的算子库(-lnnacl -lwrapper -lcmsis_nn),声明算子库文件所在路径,增加链接时编译选项(-specs=nosys.specs)。 本例中修改后的相关变量定义如下:

    LIBS = -lc -lm -lnosys -lnnacl -lwrapper -lcmsis_nn
    LIBDIR = -Lmnist/$(LITE_PACK)/tools/codegen/lib -Lmnist/$(LITE_PACK)/tools/codegen/third_party/lib
    LDFLAGS = $(MCU) -specs=nosys.specs -specs=nano.specs -T$(LDSCRIPT) $(LIBDIR) $(LIBS) -Wl,-Map=$(BUILD_DIR)/$(TARGET).map,--cref -Wl,--gc-sections
    
  • 修改main.c文件,调用benchmark函数

    在main函数中调用benchmark.c中的benchmark函数,benchmark文件夹内的程序为对生成的src内的推理代码进行推理调用并比较输出的示范样例程序,用户可以自由对它进行修改,在本例中,我们直接调用benchmark函数,并根据返回结果,赋值run_dnn_flag变量。

    run_dnn_flag = '0';
    if (benchmark() == 0) {
        printf("\nrun success.\n");
        run_dnn_flag = '1';
    } else {
        printf("\nrun failed.\n");
        run_dnn_flag = '2';
    }
    

    main.c开头增加头文件引用和run_dnn_flag变量的定义。

    #include "benchmark/benchmark.h"
    
    char run_dnn_flag __attribute__((section(".myram"))) ;//测试用数组
    

    在本例中,为方便直接使用烧录器对推理结果进行读取,把变量定义在了自定义的section段(myram)中,用户可以使用下面方式设置自定义的section段,或者忽略该声明,采用串口或其他交互方式得到推理结果。

    自定义section段的设置方法如下: 修改STM32F767IGTx_FLASH.ld中的MEMORY段,增加一个自定义内存段MYRAM(在本例中,将RAM内存起始地址加4,以腾出内存给MYRAM);接着在SECTIONS段内增加自定义的myram段声明。

    MEMORY
    {
    MYRAM (xrw)     : ORIGIN = 0x20000000, LENGTH = 1
    RAM (xrw)      : ORIGIN = 0x20000004, LENGTH = 524284
    ...
    }
    ...
    SECTIONS
    {
      ...
      .myram (NOLOAD):
      {
        . = ALIGN(4);
        _smyram = .;        /* create a global symbol at data start */
        *(.sram)           /* .data sections */
        *(.sram*)          /* .data* sections */
    
        . = ALIGN(4);
        _emyram = .;        /* define a global symbol at data end */
      } >MYRAM AT> FLASH
    }
    
  • 修改mnist/benchmark/data.c文件,将标杆输入输出数据存放在程序内以进行对比

    在benchmark例程内,会设置模型的输入数据,并将推理结果和设定的期望结果进行对比,得到误差偏移值。 在本例中,通过修改data.ccalib_input0_data数组,设置模型的输入数据,通过修改calib_output0_data,设定期望结果。

    float calib_input0_data[NET_INPUT0_SIZE] = {0.54881352186203,0.7151893377304077,0.6027633547782898,0.5448831915855408,0.42365479469299316,0.6458941102027893,0.4375872015953064,0.891772985458374,0.9636627435684204,0.3834415078163147,0.7917250394821167,0.5288949012756348,0.5680445432662964,0.9255966544151306,0.07103605568408966,0.08712930232286453,0.020218396559357643,0.832619845867157,0.7781567573547363,0.8700121641159058,0.978618323802948,0.7991585731506348,0.4614793658256531,0.7805292010307312,0.11827442795038223,0.6399210095405579,0.14335328340530396,0.9446688890457153,0.5218483209609985,0.4146619439125061,0.26455560326576233,0.7742336988449097,0.4561503231525421,0.568433940410614,0.018789799883961678,0.6176354885101318,0.6120957136154175,0.6169340014457703,0.9437480568885803,0.681820273399353,0.35950788855552673,0.43703195452690125,0.6976311802864075,0.0602254718542099,0.6667667031288147,0.670637845993042,0.21038256585597992,0.12892629206180573,0.31542834639549255,0.36371076107025146,0.5701967477798462,0.4386015236377716,0.9883738160133362,0.10204481333494186,0.20887675881385803,0.16130951046943665,0.6531082987785339,0.25329160690307617,0.4663107693195343,0.24442559480667114,0.15896958112716675,0.11037514358758926,0.6563295722007751,0.13818295300006866,0.1965823620557785,0.3687251806259155,0.8209932446479797,0.09710127860307693,0.8379449248313904,0.0960984081029892,0.9764594435691833,0.4686512053012848,0.9767611026763916,0.6048455238342285,0.7392635941505432,0.03918779268860817,0.28280696272850037,0.12019655853509903,0.296140193939209,0.11872772127389908,0.3179831802845001,0.414262980222702,0.06414749473333359,0.6924721002578735,0.5666014552116394,0.26538950204849243,0.5232480764389038,0.09394051134586334,0.5759465098381042,0.9292961955070496,0.3185689449310303,0.6674103736877441,0.13179786503314972,0.7163271903991699,0.28940609097480774,0.18319135904312134,0.5865129232406616,0.02010754682123661,0.8289400339126587,0.004695476032793522,0.6778165102005005,0.2700079679489136,0.7351940274238586,0.9621885418891907,0.2487531453371048,0.5761573314666748,0.5920419096946716,0.5722519159317017,0.22308163344860077,0.9527490139007568,0.4471253752708435,0.8464086651802063,0.6994792819023132,0.2974369525909424,0.8137978315353394,0.396505743265152,0.8811032176017761,0.5812729001045227,0.8817353844642639,0.6925315856933594,0.7252542972564697,0.5013243556022644,0.9560836553573608,0.6439902186393738,0.4238550364971161,0.6063932180404663,0.019193198531866074,0.30157482624053955,0.6601735353469849,0.2900775969028473,0.6180154085159302,0.42876869440078735,0.1354740709066391,0.29828232526779175,0.5699648857116699,0.5908727645874023,0.5743252635002136,0.6532008051872253,0.6521032452583313,0.43141844868659973,0.8965466022491455,0.36756187677383423,0.4358649253845215,0.8919233679771423,0.806194007396698,0.7038885951042175,0.10022688657045364,0.9194825887680054,0.7142413258552551,0.9988470077514648,0.14944830536842346,0.8681260347366333,0.16249293088912964,0.6155595779418945,0.1238199844956398,0.8480082154273987,0.8073189854621887,0.5691007375717163,0.40718328952789307,0.06916699558496475,0.6974287629127502,0.45354267954826355,0.7220556139945984,0.8663823008537292,0.9755215048789978,0.855803370475769,0.011714084073901176,0.359978049993515,0.729990541934967,0.17162968218326569,0.5210366249084473,0.054337989538908005,0.19999653100967407,0.01852179504930973,0.793697714805603,0.2239246815443039,0.3453516662120819,0.9280812740325928,0.704414427280426,0.031838931143283844,0.1646941602230072,0.6214783787727356,0.5772286057472229,0.23789282143115997,0.9342139959335327,0.6139659285545349,0.5356327891349792,0.5899099707603455,0.7301220297813416,0.31194499135017395,0.39822107553482056,0.20984375476837158,0.18619300425052643,0.9443724155426025,0.739550769329071,0.49045881628990173,0.22741462290287018,0.2543564736843109,0.058029159903526306,0.43441662192344666,0.3117958903312683,0.6963434815406799,0.37775182723999023,0.1796036809682846,0.024678727611899376,0.06724963337182999,0.6793927550315857,0.4536968469619751,0.5365791916847229,0.8966712951660156,0.990338921546936,0.21689698100090027,0.6630781888961792,0.2633223831653595,0.02065099962055683,0.7583786249160767,0.32001715898513794,0.38346388936042786,0.5883170962333679,0.8310484290122986,0.6289818286895752,0.872650682926178,0.27354204654693604,0.7980468273162842,0.18563593924045563,0.9527916312217712,0.6874882578849792,0.21550767123699188,0.9473705887794495,0.7308558225631714,0.2539416551589966,0.21331197023391724,0.518200695514679,0.02566271834075451,0.20747007429599762,0.4246854782104492,0.3741699755191803,0.46357542276382446,0.27762871980667114,0.5867843627929688,0.8638556003570557,0.11753185838460922,0.517379105091095,0.13206811249256134,0.7168596982955933,0.39605969190597534,0.5654212832450867,0.1832798421382904,0.14484776556491852,0.4880562722682953,0.35561272501945496,0.9404319524765015,0.7653252482414246,0.748663604259491,0.9037197232246399,0.08342243731021881,0.5521924495697021,0.5844760537147522,0.961936354637146,0.29214751720428467,0.24082878232002258,0.10029394179582596,0.016429629176855087,0.9295293092727661,0.669916570186615,0.7851529121398926,0.28173011541366577,0.5864101648330688,0.06395526975393295,0.48562759160995483,0.9774951338768005,0.8765052556991577,0.3381589651107788,0.961570143699646,0.23170162737369537,0.9493188261985779,0.9413776993751526,0.799202561378479,0.6304479241371155,0.8742879629135132,0.2930202782154083,0.8489435315132141,0.6178767085075378,0.013236857950687408,0.34723350405693054,0.14814086258411407,0.9818294048309326,0.4783703088760376,0.49739137291908264,0.6394725441932678,0.36858460307121277,0.13690027594566345,0.8221177458763123,0.1898479163646698,0.5113189816474915,0.2243170291185379,0.09784448146820068,0.8621914982795715,0.9729194641113281,0.9608346819877625,0.9065554738044739,0.774047315120697,0.3331451416015625,0.08110138773918152,0.40724116563796997,0.2322341352701187,0.13248763978481293,0.053427182137966156,0.7255943417549133,0.011427458375692368,0.7705807685852051,0.14694663882255554,0.07952208071947098,0.08960303664207458,0.6720477938652039,0.24536721408367157,0.4205394685268402,0.557368814945221,0.8605511784553528,0.7270442843437195,0.2703278958797455,0.131482794880867,0.05537432059645653,0.3015986382961273,0.2621181607246399,0.45614057779312134,0.6832813620567322,0.6956254243850708,0.28351885080337524,0.3799269497394562,0.18115095794200897,0.7885454893112183,0.05684807524085045,0.6969972252845764,0.7786954045295715,0.7774075865745544,0.25942257046699524,0.3738131523132324,0.5875996351242065,0.27282190322875977,0.3708527982234955,0.19705428183078766,0.4598558843135834,0.044612299650907516,0.7997958660125732,0.07695644348859787,0.5188351273536682,0.3068101108074188,0.5775429606437683,0.9594333171844482,0.6455702185630798,0.03536243736743927,0.4304024279117584,0.5100168585777283,0.5361775159835815,0.6813924908638,0.2775960862636566,0.12886056303977966,0.3926756680011749,0.9564056992530823,0.1871308982372284,0.9039839506149292,0.5438059568405151,0.4569114148616791,0.8820413947105408,0.45860394835472107,0.7241676449775696,0.3990253210067749,0.9040443897247314,0.6900250315666199,0.6996220350265503,0.32772040367126465,0.7567786574363708,0.6360610723495483,0.2400202751159668,0.16053882241249084,0.796391487121582,0.9591665863990784,0.4581388235092163,0.5909841656684875,0.8577226400375366,0.45722344517707825,0.9518744945526123,0.5757511854171753,0.8207671046257019,0.9088436961174011,0.8155238032341003,0.15941447019577026,0.6288984417915344,0.39843425154685974,0.06271295249462128,0.4240322411060333,0.25868406891822815,0.849038302898407,0.03330462798476219,0.9589827060699463,0.35536885261535645,0.3567068874835968,0.01632850244641304,0.18523232638835907,0.40125951170921326,0.9292914271354675,0.0996149331331253,0.9453015327453613,0.869488537311554,0.4541623890399933,0.326700896024704,0.23274412751197815,0.6144647002220154,0.03307459130883217,0.015606064349412918,0.428795725107193,0.06807407736778259,0.2519409954547882,0.2211609184741974,0.253191202878952,0.13105523586273193,0.012036222964525223,0.11548429727554321,0.6184802651405334,0.9742562174797058,0.9903450012207031,0.40905410051345825,0.1629544198513031,0.6387617588043213,0.4903053343296051,0.9894098043441772,0.06530420482158661,0.7832344174385071,0.28839850425720215,0.24141861498355865,0.6625045537948608,0.24606318771839142,0.6658591032028198,0.5173085331916809,0.4240889847278595,0.5546877980232239,0.2870515286922455,0.7065746784210205,0.414856880903244,0.3605455458164215,0.8286569118499756,0.9249669313430786,0.04600730910897255,0.2326269894838333,0.34851935505867004,0.8149664998054504,0.9854914546012878,0.9689717292785645,0.904948353767395,0.2965562641620636,0.9920112490653992,0.24942004680633545,0.10590615123510361,0.9509525895118713,0.2334202527999878,0.6897682547569275,0.05835635960102081,0.7307090759277344,0.8817201852798462,0.27243688702583313,0.3790569007396698,0.3742961883544922,0.7487882375717163,0.2378072440624237,0.17185309529304504,0.4492916464805603,0.30446839332580566,0.8391891121864319,0.23774182796478271,0.5023894309997559,0.9425836205482483,0.6339976787567139,0.8672894239425659,0.940209686756134,0.7507648468017578,0.6995750665664673,0.9679655432701111,0.9944007992744446,0.4518216848373413,0.07086978107690811,0.29279401898384094,0.15235470235347748,0.41748636960983276,0.13128933310508728,0.6041178107261658,0.38280805945396423,0.8953858613967896,0.96779465675354,0.5468848943710327,0.2748235762119293,0.5922304391860962,0.8967611789703369,0.40673333406448364,0.5520782470703125,0.2716527581214905,0.4554441571235657,0.4017135500907898,0.24841345846652985,0.5058664083480835,0.31038081645965576,0.37303486466407776,0.5249704718589783,0.7505950331687927,0.3335074782371521,0.9241587519645691,0.8623185753822327,0.048690296709537506,0.2536425292491913,0.4461355209350586,0.10462789237499237,0.34847599267959595,0.7400975227355957,0.6805144548416138,0.6223844289779663,0.7105283737182617,0.20492368936538696,0.3416981101036072,0.676242470741272,0.879234790802002,0.5436780452728271,0.2826996445655823,0.030235258862376213,0.7103368043899536,0.007884103804826736,0.37267908453941345,0.5305371880531311,0.922111451625824,0.08949454873800278,0.40594232082366943,0.024313200265169144,0.3426109850406647,0.6222310662269592,0.2790679335594177,0.2097499519586563,0.11570323258638382,0.5771402716636658,0.6952700018882751,0.6719571352005005,0.9488610029220581,0.002703213831409812,0.6471966505050659,0.60039222240448,0.5887396335601807,0.9627703428268433,0.016871673986315727,0.6964824199676514,0.8136786222457886,0.5098071694374084,0.33396488428115845,0.7908401489257812,0.09724292904138565,0.44203564524650574,0.5199523568153381,0.6939564347267151,0.09088572859764099,0.2277594953775406,0.4103015661239624,0.6232946515083313,0.8869608044624329,0.618826150894165,0.13346147537231445,0.9805801510810852,0.8717857599258423,0.5027207732200623,0.9223479628562927,0.5413808226585388,0.9233060479164124,0.8298973441123962,0.968286395072937,0.919782817363739,0.03603381663560867,0.1747720092535019,0.3891346752643585,0.9521427154541016,0.300028920173645,0.16046763956546783,0.8863046765327454,0.4463944137096405,0.9078755974769592,0.16023047268390656,0.6611174941062927,0.4402637481689453,0.07648676633834839,0.6964631676673889,0.2473987489938736,0.03961552307009697,0.05994429811835289,0.06107853725552559,0.9077329635620117,0.7398838996887207,0.8980623483657837,0.6725823283195496,0.5289399027824402,0.30444636940956116,0.997962236404419,0.36218905448913574,0.47064894437789917,0.37824517488479614,0.979526937007904,0.1746583878993988,0.32798799872398376,0.6803486943244934,0.06320761889219284,0.60724937915802,0.47764649987220764,0.2839999794960022,0.2384132742881775,0.5145127177238464,0.36792758107185364,0.4565199017524719,0.3374773859977722,0.9704936742782593,0.13343943655490875,0.09680395573377609,0.3433917164802551,0.5910269021987915,0.6591764688491821,0.3972567617893219,0.9992780089378357,0.35189300775527954,0.7214066386222839,0.6375827193260193,0.8130538463592529,0.9762256741523743,0.8897936344146729,0.7645619511604309,0.6982485055923462,0.335498183965683,0.14768557250499725,0.06263600289821625,0.2419017106294632,0.432281494140625,0.521996259689331,0.7730835676193237,0.9587409496307373,0.1173204779624939,0.10700414329767227,0.5896947383880615,0.7453980445861816,0.848150372505188,0.9358320832252502,0.9834262132644653,0.39980170130729675,0.3803351819515228,0.14780867099761963,0.6849344372749329,0.6567619442939758,0.8620625734329224,0.09725799411535263,0.49777689576148987,0.5810819268226624,0.2415570467710495,0.16902540624141693,0.8595808148384094,0.05853492394089699,0.47062090039253235,0.11583399772644043,0.45705875754356384,0.9799623489379883,0.4237063527107239,0.857124924659729,0.11731556057929993,0.2712520658969879,0.40379273891448975,0.39981213212013245,0.6713835000991821,0.3447181284427643,0.713766872882843,0.6391869187355042,0.399161159992218,0.43176013231277466,0.614527702331543,0.0700421929359436,0.8224067091941833,0.65342116355896,0.7263424396514893,0.5369229912757874,0.11047711223363876,0.4050356149673462,0.40537357330322266,0.3210429847240448,0.029950324445962906,0.73725426197052,0.10978446155786514,0.6063081622123718,0.7032175064086914,0.6347863078117371,0.95914226770401,0.10329815745353699,0.8671671748161316,0.02919023483991623,0.534916877746582,0.4042436182498932,0.5241838693618774,0.36509987711906433,0.19056691229343414,0.01912289671599865,0.5181497931480408,0.8427768349647522,0.3732159435749054,0.2228638231754303,0.080532006919384,0.0853109210729599,0.22139644622802734,0.10001406073570251,0.26503971219062805,0.06614946573972702,0.06560486555099487,0.8562761545181274,0.1621202677488327,0.5596824288368225,0.7734555602073669,0.4564095735549927,0.15336887538433075,0.19959613680839539,0.43298420310020447,0.52823406457901,0.3494403064250946,0.7814795970916748,0.7510216236114502,0.9272118210792542,0.028952548280358315,0.8956912755966187,0.39256879687309265,0.8783724904060364,0.690784752368927,0.987348735332489,0.7592824697494507,0.3645446300506592,0.5010631680488586,0.37638914585113525,0.364911824464798,0.2609044909477234,0.49597030878067017,0.6817399263381958,0.27734026312828064,0.5243797898292542,0.117380291223526,0.1598452925682068,0.04680635407567024,0.9707314372062683,0.0038603513967245817,0.17857997119426727,0.6128667593002319,0.08136960119009018,0.8818964958190918,0.7196201682090759,0.9663899540901184,0.5076355338096619,0.3004036843776703,0.549500584602356,0.9308187365531921,0.5207614302635193,0.2672070264816284,0.8773987889289856,0.3719187378883362,0.0013833499979227781,0.2476850152015686,0.31823351979255676,0.8587774634361267,0.4585031569004059,0.4445872902870178,0.33610227704048157,0.880678117275238,0.9450267553329468,0.9918903112411499,0.3767412602901459,0.9661474227905273,0.7918795943260193,0.675689160823822,0.24488948285579681,0.21645726263523102,0.1660478264093399,0.9227566123008728,0.2940766513347626,0.4530942440032959,0.49395784735679626,0.7781715989112854,0.8442349433898926,0.1390727013349533,0.4269043505191803,0.842854917049408,0.8180332779884338};
    float calib_output0_data[NET_OUTPUT0_SIZE] = {3.5647096e-05,6.824297e-08,0.009327697,3.2340475e-05,1.1117579e-05,1.5117058e-06,4.6314454e-07,5.161628e-11,0.9905911,3.8835238e-10};
    

编译工程及烧录

  • 编译

    在MCU工程目录下,执行make命令进行编译,编译成功后显示如下,test_stm767为本例的MCU工程名:

    arm-none-eabi-size build/test_stm767.elf
    text      data    bss    dec       hex      filename
    120316    3620    87885  211821    33b6d    build/test_stm767.elf
    arm-none-eabi-objcopy -O ihex build/test_stm767.elf build/test_stm767.hex
    arm-none-eabi-objcopy -O binary -S build/test_stm767.elf build/test_stm767.bin
    
  • 烧录运行

    我们可以通过STMSTM32CubePrg烧录工具进行代码烧录并运行。在PC机上,通过STLink连接一块可烧录的开发板,然后在当前MCU工程目录下运行以下命令,执行烧录并运行程序:

    bash ${STMSTM32CubePrg_PATH}/bin/STM32_Programmer.sh -c port=SWD -w build/test_stm767.bin 0x08000000 -s 0x08000000
    

    ${STMSTM32CubePrg_PATH为}为STMSTM32CubePrg安装路径。关于命令中的各参数含义,请参考STMSTM32CubePrg的使用手册。

推理结果验证

本例中,我们把benchmark运行结果标志保存在了起始地址为0x20000000且大小为1字节的内存段内,故可以直接通过烧录器获取该处地址的数据,以得到程序返回结果。 在PC机上,通过STLink连接一块已烧录好程序的开发板,通过执行以下命令读取内存数据:

bash ${STMSTM32CubePrg_PATH为}/bin/STM32_Programmer.sh -c port=SWD model=HOTPLUG --upload 0x20000000 0x1 ret.bin

${STMSTM32CubePrg_PATH为}为STMSTM32CubePrg安装路径。关于命令中的各参数含义,请参考STMSTM32CubePrg的使用手册。

读取的数据被保存在ret.bin文件内,运行cat ret.bin,如果板端推理成功,ret.bin内保存着字符1,会显示如下:

1

在轻鸿蒙设备上执行推理

轻鸿蒙编译环境准备

用户可以通过OpenHarmony官网来学习轻鸿蒙环境下的编译及烧录。 本教程以Hi3516开发板为例,演示如何在轻鸿蒙环境上使用Micro部署推理模型。

编译模型

使用converter_lite编译lenet模型,生成对应轻鸿蒙平台的推理代码,命令如下:

./converter_lite --fmk=TFLITE --modelFile=mnist.tflite --outputFile=${SOURCE_CODE_DIR} --configFile=${COFIG_FILE}

其中config配置文件设置target = ARM32。

编写构建脚本

轻鸿蒙应用程序开发请先参考运行Hello OHOS。将上一步生成的mnist目录拷贝到任意鸿蒙源码路径下,假设为applications/sample/,然后新建BUILD.gn文件:

<harmony-source-path>/applications/sample/mnist
├── benchmark
├── CMakeLists.txt
├── BUILD.gn
└── src  

下载适用于OpenHarmony的预编译推理runtime包,然后将其解压至任意鸿蒙源码路径下。编写BUILD.gn文件:

import("//build/lite/config/component/lite_component.gni")
import("//build/lite/ndk/ndk.gni")

lite_component("mnist_benchmark") {
    target_type = "executable"
    sources = [
        "benchmark/benchmark.cc",
        "benchmark/calib_output.cc",
        "benchmark/load_input.c",
        "src/net.c",
        "src/weight.c",
        "src/session.cc",
        "src/tensor.cc",
    ]
    features = []
    include_dirs = [
        "<YOUR MINDSPORE LITE RUNTIME PATH>/runtime",
        "<YOUR MINDSPORE LITE RUNTIME PATH>/tools/codegen/include",
        "//applications/sample/mnist/benchmark",
        "//applications/sample/mnist/src",
    ]
    ldflags = [
        "-fno-strict-aliasing",
        "-Wall",
        "-pedantic",
        "-std=gnu99",
    ]
    libs = [
        "<YOUR MINDSPORE LITE RUNTIME PATH>/runtime/lib/libmindspore-lite.a",
        "<YOUR MINDSPORE LITE RUNTIME PATH>/tools/codegen/lib/libwrapper.a",
    ]
    defines = [
        "NOT_USE_STL",
        "ENABLE_NEON",
        "ENABLE_ARM",
        "ENABLE_ARM32"
    ]
    cflags = [
        "-fno-strict-aliasing",
        "-Wall",
        "-pedantic",
        "-std=gnu99",
    ]
    cflags_cc = [
        "-fno-strict-aliasing",
        "-Wall",
        "-pedantic",
        "-std=c++17",
    ]
}

<YOUR MINDSPORE LITE RUNTIME PATH>是解压出来的推理runtime包路径,比如//applications/sample/mnist/mindspore-lite-1.3.0-ohos-aarch32。 修改文件build/lite/components/applications.json,添加组件mnist_benchmark的配置:

{
    "component": "mnist_benchmark",
    "description": "Communication related samples.",
    "optional": "true",
    "dirs": [
    "applications/sample/mnist"
    ],
    "targets": [
    "//applications/sample/mnist:mnist_benchmark"
    ],
    "rom": "",
    "ram": "",
    "output": [],
    "adapted_kernel": [ "liteos_a" ],
    "features": [],
    "deps": {
    "components": [],
    "third_party": []
    }
},

修改文件vendor/hisilicon/hispark_taurus/config.json,新增mnist_benchmark组件的条目:

{ "component": "mnist_benchmark", "features":[] }

编译benchmark

cd <openharmony-source-path>
hb set(设置编译路径)
.(选择当前路径)
选择ipcamera_hispark_taurus@hisilicon并回车
hb build mnist_benchmark(执行编译)

生成结果文件out/hispark_taurus/ipcamera_hispark_taurus/bin/mnist_benchmark。

执行benchmark

将mnist_benchmark、权重文件(mnist/src/net.bin)以及输入文件解压后拷贝到开发板上,然后执行:

OHOS # ./mnist_benchmark mnist_input.bin net.bin 1
OHOS # =======run benchmark======
input 0: mnist_input.bin

loop count: 1
total time: 10.11800ms, per time: 10.11800ms

outputs:
name: int8toft32_Softmax-7_post0/output-0, DataType: 43, Elements: 10, Shape: [1 10 ], Data:
0.000000, 0.000000, 0.003906, 0.000000, 0.000000, 0.992188, 0.000000, 0.000000, 0.000000, 0.000000,
========run success=======

自定义算子

使用前请先参考自定义算子了解基本概念。Micro目前仅支持custom类型的自定义算子注册和实现,暂不支持内建算子(比如conv2d、fc等)的注册和自定义实现。下面以海思Hi3516D开发板为例,说明如何在Micro中使用自定义算子。

使用转换工具生成NNIE的自定义算子具体步骤请参考集成NNIE使用说明

模型生成代码方式与非自定义算子模型保持一致:

./converter_lite --fmk=TFLITE --modelFile=mnist.tflite --outputFile=${SOURCE_CODE_DIR} --configFile=${COFIG_FILE}

其中config配置文件设置target = ARM32。

用户实现自定义算子

上一步会在用户指定路径下生成源码目录,其有一个名为src/registered_kernel.h的头文件指定了custom算子的函数声明:

int CustomKernel(TensorC *inputs, int input_num, TensorC *outputs, int output_num, CustomParameter *param);

用户需要提供该函数的实现,并将相关源码或者库集成到生成代码的cmake工程中。例如,我们提供了支持海思NNIE的custom kernel示例动态库libmicro_nnie.so,该文件包含在官网下载页“NNIE 推理runtime库、benchmark工具”组件中。用户需要修改生成代码的CMakeLists.txt,添加链接的库名称和路径。例如:

link_directories(<YOUR_PATH>/mindspore-lite-1.8.1-linux-aarch32/providers/Hi3516D)

link_directories(<HI3516D_SDK_PATH>)

target_link_libraries(benchmark net micro_nnie nnie mpi VoiceEngine upvqe dnvqe securec -lm -pthread)

在生成的benchmark/benchmark.c文件中,在main函数的调用前后添加NNIE设备相关初始化代码,最后进行源码编译:

mkdir buid && cd build

cmake -DCMAKE_TOOLCHAIN_FILE=<MS_SRC_PATH>/mindspore/lite/cmake/himix200.toolchain.cmake -DPLATFORM_ARM32=ON -DPKG_PATH=<RUNTIME_PKG_PATH> ..

make

Micro推理与端侧训练结合

概述

除MCU外,Micro推理是一种模型结构与权重分离的推理模式。训练一般是改变了权重,但不会改变模型结构。那么,在训练与推理配合的场景下,可以采用端侧训练+Micro推理的模式,以利用Micro推理运行内存小、功耗小的优势。具体过程包括以下几步:

  • 基于端侧训练导出推理模型

  • 通过converter_lite转换工具,生成与端侧训练相同架构下的模型推理代码

  • 下载得到与端侧训练相同架构对应的Micro

  • 对得到的推理代码和Micro库进行集成,编译并部署

  • 基于端侧训练导出推理模型的权重,覆盖原有权重文件,进行验证

    接下来我们将详细介绍各个步骤及其注意事项。

训练导出推理模型

用户可以直接参考端侧训练一节。

生成推理代码

用户可以直接参考上述内容,但需要注意两个点。第一,训练导出的模型是ms模型,因此在转换时,需设置fmkMSLITE;第二,为了能够将训练与Micro推理结合,就需要保证训练导出的权重和Micro导出的权重完全匹配,因此,我们在Micro配置参数中新增了两个属性,以保证权重的一致性。

[micro_param]
# false indicates that only the required weights will be saved. Default is false.
# If collaborate with lite-train, the parameter must be true.
keep_original_weight=false

# the names of those weight-tensors whose shape is changeable, only embedding-table supports change now.
# the parameter is used to collaborate with lite-train. If set, `keep_original_weight` must be true.
changeable_weights_name=name0,name1

keep_original_weight是保证权重一致性的关键属性,与训练配合时,此属性必须为true。changeable_weights_name是针对特殊场景下的属性,例如某些权重的shape发生了变化,当然,当前仅支持embedding表的个数发生变化,一般而言,用户无需设置该属性。

编译部署

用户可以直接参考上述内容。

训练导出推理模型的权重

MindSpore的Serialization类提供了ExportWeightsCollaborateWithMicro函数,ExportWeightsCollaborateWithMicro原型如下:

  static Status ExportWeightsCollaborateWithMicro(const Model &model, ModelType model_type,
                                                  const std::string &weight_file, bool is_inference = true,
                                                  bool enable_fp16 = false,
                                                  const std::vector<std::string> &changeable_weights_name = {});

其中,is_inference当前仅支持为true。