{ "cells": [ { "cell_type": "markdown", "id": "99803a48", "metadata": {}, "source": [ "[![在线运行](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/resource/_static/logo_modelarts.png)](https://authoring-modelarts-cnnorth4.huaweicloud.com/console/lab?share-url-b64=aHR0cHM6Ly9vYnMuZHVhbHN0YWNrLmNuLW5vcnRoLTQubXlodWF3ZWljbG91ZC5jb20vbWluZHNwb3JlLXdlYnNpdGUvbm90ZWJvb2svcjIuMC4wLWFscGhhL3R1dG9yaWFscy96aF9jbi9hZHZhbmNlZC9taW5kc3BvcmVfZGVyaXZhdGlvbi5pcHluYg==&imageid=77ef960a-bd26-4de4-9695-5b85a786fb90) [![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/resource/_static/logo_notebook.png)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/r2.0.0-alpha/tutorials/zh_cn/advanced/mindspore_derivation.ipynb) \n", "[![下载样例代码](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/resource/_static/logo_download_code.png)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/r2.0.0-alpha/tutorials/zh_cn/advanced/mindspore_derivation.py) \n", "[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/r2.0.0-alpha/tutorials/source_zh_cn/advanced/derivation.ipynb)\n", "\n", "# 高级自动微分\n", "\n", "`mindspore.ops`模块提供的`grad`和`value_and_grad`接口可以生成网络模型的梯度。`grad`计算网络梯度,`value_and_grad`同时计算网络的正向输出和梯度。本文主要介绍如何使用`grad`接口的主要功能,包括一阶、二阶求导,单独对输入或网络权重求导,返回辅助变量,以及如何停止计算梯度。\n", "\n", "> 更多求导接口相关信息可参考[API文档](https://mindspore.cn/docs/zh-CN/r2.0.0-alpha/api_python/mindspore/mindspore.grad.html)。\n", "\n", "## 一阶求导\n", "\n", "计算一阶导数方法:`mindspore.grad`,其中参数使用方式为:\n", "\n", "+ `fn`:待求导的函数或网络。\n", "+ `grad_position`:指定求导输入位置的索引。若为int类型,表示对单个输入求导;若为tuple类型,表示对tuple内索引的位置求导,其中索引从0开始;若是None,表示不对输入求导,这种场景下,`weights`非None。默认值:0。\n", "+ `weights`:训练网络中需要返回梯度的网络变量。一般可通过`weights = net.trainable_params()`获取。默认值:None。\n", "+ `has_aux`:是否返回辅助参数的标志。若为True,`fn`输出数量必须超过一个,其中只有`fn`第一个输出参与求导,其他输出值将直接返回。默认值:False。\n", "\n", "下面先构建自定义网络模型`Net`,再对其进行一阶求导,通过这样一个例子对`grad`接口的使用方式做简单介绍,即公式:\n", "\n", "$$f(x, y)=x * x * y * z \\tag{1}$$\n", "\n", "首先定义网络模型`Net`、输入`x`和输入`y`:" ] }, { "cell_type": "code", "execution_count": 1, "id": "5c44c739", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from mindspore import ops, Tensor\n", "import mindspore.nn as nn\n", "import mindspore as ms\n", "\n", "# 定义输入x和y\n", "x = Tensor([3.0], dtype=ms.float32)\n", "y = Tensor([5.0], dtype=ms.float32)\n", "\n", "\n", "class Net(nn.Cell):\n", " def __init__(self):\n", " super(Net, self).__init__()\n", " self.z = ms.Parameter(ms.Tensor(np.array([1.0], np.float32)), name='z')\n", "\n", " def construct(self, x, y):\n", " out = x * x * y * self.z\n", " return out\n" ] }, { "cell_type": "markdown", "id": "8a2fb7c2", "metadata": {}, "source": [ "### 对输入求一阶导\n", "\n", "对输入`x`, `y`进行求导,需要将`grad_position`设置成(0, 1):\n", "\n", "$$\\frac{\\partial f}{\\partial x}=2 * x * y * z \\tag{2}$$\n", "\n", "$$\\frac{\\partial f}{\\partial y}=x * x * z \\tag{3}$$" ] }, { "cell_type": "code", "execution_count": 2, "id": "87b26056", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Tensor(shape=[1], dtype=Float32, value= [ 3.00000000e+01]), Tensor(shape=[1], dtype=Float32, value= [ 9.00000000e+00]))\n" ] } ], "source": [ "net = Net()\n", "grad_fn = ms.grad(net, grad_position=(0, 1))\n", "gradients = grad_fn(x, y)\n", "print(gradients)\n" ] }, { "cell_type": "markdown", "id": "e3817b27", "metadata": {}, "source": [ "### 对权重进行求导\n", "\n", "对权重`z`进行求导,这里不需要对输入求导,将`grad_position`设置成None:\n", "\n", "$$\\frac{\\partial f}{\\partial z}=x * x * y \\tag{4}$$" ] }, { "cell_type": "code", "execution_count": 3, "id": "a50c7998", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Tensor(shape=[1], dtype=Float32, value= [ 4.50000000e+01]),)\n" ] } ], "source": [ "params = ms.ParameterTuple(net.trainable_params())\n", "\n", "output = ms.grad(net, grad_position=None, weights=params)(x, y)\n", "print(output)\n" ] }, { "cell_type": "markdown", "id": "6541b867", "metadata": {}, "source": [ "### 返回辅助变量\n", "\n", "同时对输入和权重求导,其中只有第一个输出参与求导,示例代码如下:" ] }, { "cell_type": "code", "execution_count": 4, "id": "823aabcc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "16, (16, 1)\n" ] } ], "source": [ "net = nn.Dense(10, 1)\n", "loss_fn = nn.MSELoss()\n", "\n", "\n", "def forward(inputs, labels):\n", " logits = net(inputs)\n", " loss = loss_fn(logits, labels)\n", " return loss, logits\n", "\n", "\n", "inputs = Tensor(np.random.randn(16, 10).astype(np.float32))\n", "labels = Tensor(np.random.randn(16, 1).astype(np.float32))\n", "weights = net.trainable_params()\n", "\n", "# Aux value does not contribute to the gradient.\n", "grad_fn = ms.grad(forward, grad_position=0, weights=None, has_aux=True)\n", "inputs_gradient, (aux_logits,) = grad_fn(inputs, labels)\n", "print(len(inputs_gradient), aux_logits.shape)\n" ] }, { "cell_type": "markdown", "id": "f5447a00", "metadata": {}, "source": [ "### 停止计算梯度\n", "\n", "可以使用`stop_gradient`来停止计算指定算子的梯度,从而消除该算子对梯度的影响。\n", "\n", "在上面一阶求导使用的矩阵相乘网络模型的基础上,再增加一个算子`out2`并禁止计算其梯度,得到自定义网络`Net2`,然后看一下对输入的求导结果情况。\n", "\n", "示例代码如下:" ] }, { "cell_type": "code", "execution_count": 5, "id": "3c2c6388", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5.0]\n" ] } ], "source": [ "class Net(nn.Cell):\n", "\n", " def __init__(self):\n", " super(Net, self).__init__()\n", "\n", " def construct(self, x, y):\n", " out1 = x * y\n", " out2 = x * y\n", " out2 = ops.stop_gradient(out2) # 停止计算out2算子的梯度\n", " out = out1 + out2\n", " return out\n", "\n", "\n", "net = Net()\n", "grad_fn = ms.grad(net)\n", "output = grad_fn(x, y)\n", "print(output)\n" ] }, { "cell_type": "markdown", "id": "9895bafe", "metadata": {}, "source": [ "从上面的打印可以看出,由于对`out2`设置了`stop_gradient`,所以`out2`没有对梯度计算有任何的贡献,其输出结果与未加`out2`算子时一致。\n", "\n", "下面删除`out2 = stop_gradient(out2)`,再来看一下输出结果。示例代码为:" ] }, { "cell_type": "code", "execution_count": 6, "id": "873ed1e9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[10.0]\n" ] } ], "source": [ "class Net(nn.Cell):\n", " def __init__(self):\n", " super(Net, self).__init__()\n", "\n", " def construct(self, x, y):\n", " out1 = x * y\n", " out2 = x * y\n", " # out2 = stop_gradient(out2)\n", " out = out1 + out2\n", " return out\n", "\n", "\n", "net = Net()\n", "grad_fn = ms.grad(net)\n", "output = grad_fn(x, y)\n", "print(output)\n" ] }, { "cell_type": "markdown", "id": "59d82d3a", "metadata": {}, "source": [ "打印结果可以看出,把`out2`算子的梯度也计算进去之后,由于`out2`和`out1`算子完全相同,因此它们产生的梯度也完全相同,所以可以看到,结果中每一项的值都变为了原来的两倍(存在精度误差)。" ] }, { "cell_type": "markdown", "id": "4e581bd8", "metadata": {}, "source": [ "## 高阶求导\n", "\n", "高阶微分在AI支持科学计算、二阶优化等领域均有应用。如分子动力学模拟中,利用神经网络训练势能时,损失函数中需计算神经网络输出对输入的导数,则反向传播便存在损失函数对输入、权重的二阶交叉导数。\n", "\n", "此外,AI求解微分方程(如PINNs方法)还会存在输出对输入的二阶导数。又如二阶优化中,为了能够让神经网络快速收敛,牛顿法等需计算损失函数对权重的二阶导数。\n", "\n", "MindSpore可通过多次求导的方式支持高阶导数,下面通过几类例子展开阐述。\n", "\n", "### 单输入单输出高阶导数\n", "\n", "例如Sin算子,其公式为:\n", "\n", "$$f(x) = sin(x) \\tag{1}$$\n", "\n", "其一阶导数是:\n", "\n", "$$f'(x) = cos(x) \\tag{2}$$\n", "\n", "其二阶导数为:\n", "\n", "$$f''(x) = cos'(x) = -sin(x) \\tag{3}$$\n", "\n", "其二阶导数(-Sin)实现如下:" ] }, { "cell_type": "code", "execution_count": 7, "id": "6f91569c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-0.]\n" ] } ], "source": [ "import numpy as np\n", "import mindspore.nn as nn\n", "import mindspore.ops as ops\n", "import mindspore as ms\n", "\n", "\n", "class Net(nn.Cell):\n", " \"\"\"前向网络模型\"\"\"\n", "\n", " def __init__(self):\n", " super(Net, self).__init__()\n", " self.sin = ops.Sin()\n", "\n", " def construct(self, x):\n", " out = self.sin(x)\n", " return out\n", "\n", "\n", "x_train = ms.Tensor(np.array([3.1415926]), dtype=ms.float32)\n", "\n", "net = Net()\n", "firstgrad = ms.grad(net)\n", "secondgrad = ms.grad(firstgrad)\n", "output = secondgrad(x_train)\n", "\n", "# 打印结果\n", "result = np.around(output.asnumpy(), decimals=2)\n", "print(result)\n" ] }, { "cell_type": "markdown", "id": "bb5cc189", "metadata": {}, "source": [ "从上面的打印结果可以看出,$-sin(3.1415926)$的值接近于$0$。" ] }, { "cell_type": "markdown", "id": "c324bb9b", "metadata": {}, "source": [ "### 单输入多输出高阶导数\n", "\n", "对如下公式求导:\n", "\n", "$$f(x) = (f_1(x), f_2(x)) \\tag{1}$$\n", "\n", "其中:\n", "\n", "$$f_1(x) = sin(x) \\tag{2}$$\n", "\n", "$$f_2(x) = cos(x) \\tag{3}$$\n", "\n", "梯度计算时由于MindSpore采用的是反向自动微分机制,会对输出结果求和后再对输入求导。因此其一阶导数是:\n", "\n", "$$f'(x) = cos(x) -sin(x) \\tag{4}$$\n", "\n", "其二阶导数为:\n", "\n", "$$f''(x) = -sin(x) - cos(x) \\tag{5}$$" ] }, { "cell_type": "code", "execution_count": 8, "id": "fb879cfd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1.]\n" ] } ], "source": [ "import numpy as np\n", "from mindspore import ops, Tensor\n", "import mindspore.nn as nn\n", "import mindspore as ms\n", "\n", "\n", "class Net(nn.Cell):\n", " \"\"\"前向网络模型\"\"\"\n", "\n", " def __init__(self):\n", " super(Net, self).__init__()\n", " self.sin = ops.Sin()\n", " self.cos = ops.Cos()\n", "\n", " def construct(self, x):\n", " out1 = self.sin(x)\n", " out2 = self.cos(x)\n", " return out1, out2\n", "\n", "\n", "x_train = Tensor(np.array([3.1415926]), dtype=ms.float32)\n", "\n", "net = Net()\n", "firstgrad = ms.grad(net)\n", "secondgrad = ms.grad(firstgrad)\n", "output = secondgrad(x_train)\n", "\n", "# 打印结果\n", "result = np.around(output.asnumpy(), decimals=2)\n", "print(result)\n" ] }, { "cell_type": "markdown", "id": "a1f77f32", "metadata": {}, "source": [ "从上面的打印结果可以看出,$-sin(3.1415926) - cos(3.1415926)$的值接近于$1$。" ] }, { "cell_type": "markdown", "id": "c2dd1c33", "metadata": {}, "source": [ "### 多输入多输出高阶导数\n", "\n", "对如下公式求导:\n", "\n", "$$f(x, y) = (f_1(x, y), f_2(x, y)) \\tag{1}$$\n", "\n", "其中:\n", "\n", "$$f_1(x, y) = sin(x) - cos(y) \\tag{2}$$\n", "\n", "$$f_2(x, y) = cos(x) - sin(y) \\tag{3}$$\n", "\n", "梯度计算时由于MindSpore采用的是反向自动微分机制, 会对输出结果求和后再对输入求导。\n", "\n", "求和:\n", "\n", "$$\\sum{output} = sin(x) + cos(x) - sin(y) - cos(y) \\tag{4}$$\n", "\n", "输出和关于输入$x$的一阶导数为:\n", "\n", "$$\\dfrac{\\mathrm{d}\\sum{output}}{\\mathrm{d}x} = cos(x) - sin(x) \\tag{5}$$\n", "\n", "输出和关于输入$x$的二阶导数为:\n", "\n", "$$\\dfrac{\\mathrm{d}\\sum{output}^{2}}{\\mathrm{d}^{2}x} = -sin(x) - cos(x) \\tag{6}$$\n", "\n", "输出和关于输入$y$的一阶导数为:\n", "\n", "$$\\dfrac{\\mathrm{d}\\sum{output}}{\\mathrm{d}y} = -cos(y) + sin(y) \\tag{7}$$\n", "\n", "输出和关于输入$y$的二阶导数为:\n", "\n", "$$\\dfrac{\\mathrm{d}\\sum{output}^{2}}{\\mathrm{d}^{2}y} = sin(y) + cos(y) \\tag{8}$$" ] }, { "cell_type": "code", "execution_count": 9, "id": "aa0dfd42", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1.]\n", "[-1.]\n" ] } ], "source": [ "import numpy as np\n", "from mindspore import ops, Tensor\n", "import mindspore.nn as nn\n", "import mindspore as ms\n", "\n", "\n", "class Net(nn.Cell):\n", " \"\"\"前向网络模型\"\"\"\n", "\n", " def __init__(self):\n", " super(Net, self).__init__()\n", " self.sin = ops.Sin()\n", " self.cos = ops.Cos()\n", "\n", " def construct(self, x, y):\n", " out1 = self.sin(x) - self.cos(y)\n", " out2 = self.cos(x) - self.sin(y)\n", " return out1, out2\n", "\n", "\n", "x_train = Tensor(np.array([3.1415926]), dtype=ms.float32)\n", "y_train = Tensor(np.array([3.1415926]), dtype=ms.float32)\n", "\n", "net = Net()\n", "firstgrad = ms.grad(net, grad_position=(0, 1))\n", "secondgrad = ms.grad(firstgrad, grad_position=(0, 1))\n", "output = secondgrad(x_train, y_train)\n", "\n", "# 打印结果\n", "print(np.around(output[0].asnumpy(), decimals=2))\n", "print(np.around(output[1].asnumpy(), decimals=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "从上面的打印结果可以看出,输出对输入$x$的二阶导数$-sin(3.1415926) - cos(3.1415926)$的值接近于$1$,输出对输入$y$的二阶导数$sin(3.1415926) + cos(3.1415926)$的值接近于$-1$。\n", "\n", "> 由于不同计算平台的精度可能存在差异,因此本章节中的代码在不同平台上的执行结果会存在微小的差别。" ] } ], "metadata": { "kernelspec": { "display_name": "MindSpore", "language": "python", "name": "mindspore" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" }, "vscode": { "interpreter": { "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1" } } }, "nbformat": 4, "nbformat_minor": 5 }