{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[![在线运行](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0/resource/_static/logo_modelarts.png)](https://authoring-modelarts-cnnorth4.huaweicloud.com/console/lab?share-url-b64=aHR0cHM6Ly9vYnMuZHVhbHN0YWNrLmNuLW5vcnRoLTQubXlodWF3ZWljbG91ZC5jb20vbWluZHNwb3JlLXdlYnNpdGUvbm90ZWJvb2svcjIuMC90dXRvcmlhbHMvemhfY24vYWR2YW5jZWQvbW9kdWxlcy9taW5kc3BvcmVfbGF5ZXIuaXB5bmI=&imageid=e225a9aa-230a-4ea5-a538-b5faed64a6a6) [![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0/resource/_static/logo_notebook.png)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/r2.0/tutorials/zh_cn/advanced/modules/mindspore_layer.ipynb) [![下载样例代码](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0/resource/_static/logo_download_code.png)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/r2.0/tutorials/zh_cn/advanced/modules/mindspore_layer.py) [![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/r2.0/tutorials/source_zh_cn/advanced/modules/layer.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Cell与参数\n", "\n", "Cell作为神经网络构造的基础单元,与神经网络层(Layer)的概念相对应,对Tensor计算操作的抽象封装,能够更准确清晰地对神经网络结构进行表示。除了基础的Tensor计算流程定义外,神经网络层还包含了参数管理、状态管理等功能。而参数(Parameter)是神经网络训练的核心,通常作为神经网络层的内部成员变量。本节我们将系统介绍参数、神经网络层以及其相关使用方法。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parameter\n", "\n", "参数(Parameter)是一类特殊的Tensor,是指在模型训练过程中可以对其值进行更新的变量。MindSpore提供`mindspore.Parameter`类进行Parameter的构造。为了对不同用途的Parameter进行区分,下面对两种不同类别的Parameter进行定义:\n", "\n", "- 可训练参数。在模型训练过程中根据反向传播算法求得梯度后进行更新的Tensor,此时需要将`required_grad`设置为`True`。\n", "- 不可训练参数。不参与反向传播,但需要更新值的Tensor(如BatchNorm中的`mean`和`var`变量),此时需要将`requires_grad`设置为`False`。\n", "\n", "> Parameter默认设置`required_grad=True`。\n", "\n", "下面我们构造一个简单的全连接层:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import mindspore\n", "from mindspore import nn\n", "from mindspore import ops\n", "from mindspore import Tensor, Parameter\n", "\n", "class Network(nn.Cell):\n", " def __init__(self):\n", " super().__init__()\n", " self.w = Parameter(Tensor(np.random.randn(5, 3), mindspore.float32), name='w') # weight\n", " self.b = Parameter(Tensor(np.random.randn(3,), mindspore.float32), name='b') # bias\n", "\n", " def construct(self, x):\n", " z = ops.matmul(x, self.w) + self.b\n", " return z\n", "\n", "net = Network()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在`Cell`的`__init__`方法中,我们定义了`w`和`b`两个Parameter,并配置`name`进行命名空间管理。在`construct`方法中使用`self.attr`直接调用参与Tensor运算。" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### 获取Parameter\n", "\n", "在使用Cell+Parameter构造神经网络层后,我们可以使用多种方法来获取Cell管理的Parameter。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 获取单个参数\n", "\n", "单独获取某个特定参数,直接调用Python类的成员变量即可。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2021-12-29T03:42:22.729232Z", "start_time": "2021-12-29T03:42:22.723517Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-1.2192779 -0.36789745 0.0946381 ]\n" ] } ], "source": [ "print(net.b.asnumpy())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 获取可训练参数\n", "\n", "可使用`Cell.trainable_params`方法获取可训练参数,通常在配置优化器时需调用此接口。" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Parameter (name=w, shape=(5, 3), dtype=Float32, requires_grad=True), Parameter (name=b, shape=(3,), dtype=Float32, requires_grad=True)]\n" ] } ], "source": [ "print(net.trainable_params())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 获取所有参数\n", "\n", "使用`Cell.get_parameters()`方法可获取所有参数,此时会返回一个Python迭代器。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "print(type(net.get_parameters()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "或者可以调用`Cell.parameters_and_names`返回参数名称及参数。" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "w:\n", "[[ 4.15680408e-02 -1.20311625e-01 5.02573885e-02]\n", " [ 1.22175144e-04 -1.34980649e-01 1.17642188e+00]\n", " [ 7.57667869e-02 -1.74758151e-01 -5.19092619e-01]\n", " [-1.67846107e+00 3.27240258e-01 -2.06452996e-01]\n", " [ 5.72323874e-02 -8.27963874e-02 5.94243526e-01]]\n", "b:\n", "[-1.2192779 -0.36789745 0.0946381 ]\n" ] } ], "source": [ "for name, param in net.parameters_and_names():\n", " print(f\"{name}:\\n{param.asnumpy()}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 修改Parameter\n", "\n", "#### 直接修改参数值\n", "\n", "Parameter是一种特殊的Tensor,因此可以使用Tensor索引修改的方式对其值进行修改。" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. -0.36789745 0.0946381 ]\n" ] } ], "source": [ "net.b[0] = 1.\n", "print(net.b.asnumpy())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 覆盖修改参数值\n", "\n", "可调用`Parameter.set_data`方法,使用相同Shape的Tensor对Parameter进行覆盖。该方法常用于使用Initializer进行[Cell遍历初始化](https://www.mindspore.cn/tutorials/zh-CN/r2.0/advanced/modules/initializer.html#cell%E9%81%8D%E5%8E%86%E5%88%9D%E5%A7%8B%E5%8C%96)。" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[3. 4. 5.]\n" ] } ], "source": [ "net.b.set_data(Tensor([3, 4, 5]))\n", "print(net.b.asnumpy())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 运行时修改参数值\n", "\n", "参数的主要作用为模型训练时对其值进行更新,在反向传播获得梯度后,或不可训练参数需要进行更新,都涉及到运行时参数修改。由于MindSpore的[计算图](https://www.mindspore.cn/tutorials/zh-CN/r2.0/advanced/compute_graph.html)编译设计,此时需要使用`mindspore.ops.assign`接口对参数进行赋值。该方法常用于[自定义优化器](https://www.mindspore.cn/tutorials/zh-CN/r2.0/advanced/modules/optimizer.html#%E8%87%AA%E5%AE%9A%E4%B9%89%E4%BC%98%E5%8C%96%E5%99%A8)场景。下面是一个简单的运行时修改参数值样例:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[7. 8. 9.]\n" ] } ], "source": [ "import mindspore as ms\n", "\n", "@ms.jit\n", "def modify_parameter():\n", " b_hat = ms.Tensor([7, 8, 9])\n", " ops.assign(net.b, b_hat)\n", " return True\n", "\n", "modify_parameter()\n", "print(net.b.asnumpy())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parameter Tuple\n", "\n", "变量元组ParameterTuple,用于保存多个Parameter,继承于元组tuple,提供克隆功能。\n", "\n", "如下示例提供ParameterTuple创建方法:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Parameter (name=x, shape=(2, 3), dtype=Int64, requires_grad=True), Parameter (name=y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=z, shape=(), dtype=Float32, requires_grad=True))\n", "(Parameter (name=params_copy.x, shape=(2, 3), dtype=Int64, requires_grad=True), Parameter (name=params_copy.y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=params_copy.z, shape=(), dtype=Float32, requires_grad=True))\n" ] } ], "source": [ "from mindspore.common.initializer import initializer\n", "from mindspore import ParameterTuple\n", "# 创建\n", "x = Parameter(default_input=ms.Tensor(np.arange(2 * 3).reshape((2, 3))), name=\"x\")\n", "y = Parameter(default_input=initializer('ones', [1, 2, 3], ms.float32), name='y')\n", "z = Parameter(default_input=2.0, name='z')\n", "params = ParameterTuple((x, y, z))\n", "\n", "# 从params克隆并修改名称为\"params_copy\"\n", "params_copy = params.clone(\"params_copy\")\n", "\n", "print(params)\n", "print(params_copy)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Cell训练状态转换\n", "\n", "神经网络中的部分Tensor操作在训练和推理时的表现并不相同,如`nn.Dropout`在训练时进行随机丢弃,但在推理时则不丢弃,`nn.BatchNorm`在训练时需要更新`mean`和`var`两个变量,在推理时则固定其值不变。因此我们可以通过`Cell.set_train`接口来设置神经网络的状态。\n", "\n", "`set_train(True)`时,神经网络状态为`train`, `set_train`接口默认值为`True`:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "train\n" ] } ], "source": [ "net.set_train()\n", "print(net.phase)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`set_train(False)`时,神经网络状态为`predict`:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "predict\n" ] } ], "source": [ "net.set_train(False)\n", "print(net.phase)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 自定义神经网络层\n", "\n", "通常情况下,MindSpore提供的神经网络层接口和function函数接口能够满足模型构造需求,但由于AI领域不断推陈出新,因此有可能遇到新网络结构没有内置模块的情况。此时我们可以根据需要,通过MindSpore提供的function接口、Primitive算子自定义神经网络层,并可以使用`Cell.bprop`方法自定义反向。下面分别详述三种自定义方法。\n", "\n", "### 使用function接口构造神经网络层\n", "\n", "MindSpore提供大量基础的function接口,可以使用其构造复杂的Tensor操作,封装为神经网络层。下面以`Threshold`为例,其公式如下:\n", "\n", "$$\n", "y =\\begin{cases}\n", " x, &\\text{ if } x > \\text{threshold} \\\\\n", " \\text{value}, &\\text{ otherwise }\n", " \\end{cases}\n", "$$\n", "\n", "可以看到`Threshold`判断Tensor的值是否大于`threshold`值,保留判断结果为`True`的值,替换判断结果为`False`的值。因此,对应实现如下:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "class Threshold(nn.Cell):\n", " def __init__(self, threshold, value):\n", " super().__init__()\n", " self.threshold = threshold\n", " self.value = value\n", "\n", " def construct(self, inputs):\n", " cond = ops.gt(inputs, self.threshold)\n", " value = ops.fill(inputs.dtype, inputs.shape, self.value)\n", " return ops.select(cond, inputs, value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这里分别使用了`ops.gt`、`ops.fill`、`ops.select`来实现判断和替换。下面执行自定义的`Threshold`层:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Tensor(shape=[3], dtype=Float32, value= [ 2.00000000e+01, 2.00000003e-01, 3.00000012e-01])" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = Threshold(0.1, 20)\n", "inputs = mindspore.Tensor([0.1, 0.2, 0.3], mindspore.float32)\n", "m(inputs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以看到`inputs[0] = threshold`, 因此被替换为`20`。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 自定义Cell反向\n", "\n", "在特殊场景下,我们不但需要自定义神经网络层的正向逻辑,也需要手动控制其反向的计算,此时我们可以通过`Cell.bprop`接口对其反向进行定义。在全新的神经网络结构设计、反向传播速度优化等场景下会用到该功能。下面我们以`Dropout2d`为例,介绍如何自定义Cell反向:" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "class Dropout2d(nn.Cell):\n", " def __init__(self, keep_prob):\n", " super().__init__()\n", " self.keep_prob = keep_prob\n", " self.dropout2d = ops.Dropout2D(keep_prob)\n", "\n", " def construct(self, x):\n", " return self.dropout2d(x)\n", "\n", " def bprop(self, x, out, dout):\n", " _, mask = out\n", " dy, _ = dout\n", " if self.keep_prob != 0:\n", " dy = dy * (1 / self.keep_prob)\n", " dy = mask.astype(mindspore.float32) * dy\n", " return (dy.astype(x.dtype), )\n", "\n", "dropout_2d = Dropout2d(0.8)\n", "dropout_2d.bprop_debug = True" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`bprop`方法分别有三个入参:\n", "\n", "- *x*: 正向输入,当正向输入为多个时,需同样数量的入参。\n", "- *out*: 正向输出。\n", "- *dout*: 反向传播时,当前Cell执行之前的反向结果。\n", "\n", "一般我们需要根据正向输出和前层反向结果配合,根据反向求导公式计算反向结果,并将其返回。`Dropout2d`的反向计算需要根据正向输出的`mask`矩阵对前层反向结果进行mask,然后根据`keep_prob`进行缩放。最终可得到正确的计算结果。" ] } ], "metadata": { "kernelspec": { "display_name": "MindSpore", "language": "python", "name": "mindspore" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 4 }