{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 自动微分\n", "\n", "[![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/resource/_static/logo_notebook.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r1.8/tutorials/zh_cn/beginner/mindspore_autograd.ipynb) [![下载样例代码](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/resource/_static/logo_download_code.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r1.8/tutorials/zh_cn/beginner/mindspore_autograd.py) [![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/r1.8/tutorials/source_zh_cn/beginner/autograd.ipynb)\n", "\n", "自动微分能够计算可导函数在某点处的导数值,是反向传播算法的一般化。自动微分主要解决的问题是将一个复杂的数学运算分解为一系列简单的基本运算,该功能对用户屏蔽了大量的求导细节和过程,大大降低了框架的使用门槛。\n", "\n", "MindSpore使用`ops.GradOperation`计算一阶导数,`ops.GradOperation`属性如下:\n", "\n", "+ `get_all`:计算梯度,如果等于False,获得第一个输入的梯度,如果等于True,获得所有输入的梯度。默认值为False。\n", "+ `get_by_list`:是否对权重参数进行求导,默认值为False。\n", "+ `sens_param`:是否对网络的输出值做缩放以改变最终梯度,默认值为False。\n", "\n", "本章使用MindSpore中的`ops.GradOperation`对函数 $f(x)=wx+b$ 求一阶导数。\n", "\n", "## 对输入求一阶导\n", "\n", "对输入求导前需要先定义公式:\n", "\n", "$$f(x)=wx+b \\tag {1} $$\n", "\n", "下面示例代码是公式(1)的表达,由于MindSpore采用函数式编程,因此所有计算公式表达都采用函数进行表示。" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import mindspore.nn as nn\n", "import mindspore as ms\n", "\n", "class Net(nn.Cell):\n", " def __init__(self):\n", " super(Net, self).__init__()\n", " self.w = ms.Parameter(ms.Tensor(np.array([6], np.float32)), name='w')\n", " self.b = ms.Parameter(ms.Tensor(np.array([1], np.float32)), name='b')\n", "\n", " def construct(self, x):\n", " f = self.w * x + self.b\n", " return f" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "然后定义求导类`GradNet`,类的`__init__`函数中定义需要求导的网络`self.net`和`ops.GradOperation`操作,类的`construct`函数中对`self.net`的输入进行求导。其对应MindSpore内部会产生如下公式(2):\n", "\n", "$$f^{'}(x)=w\\tag {2}$$" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import mindspore as ms\n", "import mindspore.ops as ops\n", "\n", "class GradNet(nn.Cell):\n", " def __init__(self, net):\n", " super(GradNet, self).__init__()\n", " self.net = net\n", " self.grad_op = ops.GradOperation()\n", "\n", " def construct(self, x):\n", " gradient_function = self.grad_op(self.net)\n", " return gradient_function(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "最后定义权重参数为w,并对输入公式(1)中的输入参数x求一阶导数。从运行结果来看,公式(1)中的输入为6,即:\n", "\n", "$$f(x)=wx+b=6*x+1 \\tag {3}$$\n", "\n", "对上式进行求导,有:\n", "\n", "$$f^{'}(x)=w=6 \\tag {4}$$" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[6.]\n" ] } ], "source": [ "x = ms.Tensor([100], dtype=ms.float32)\n", "output = GradNet(Net())(x)\n", "\n", "print(output)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "MindSpore计算一阶导数方法`ops.GradOperation(get_all=False, get_by_list=False, sens_param=False)`,其中`get_all`为`False`时,只会对第一个输入求导,为`True`时,会对所有输入求导。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 对权重求一阶导\n", "\n", "对权重参数求一阶导,需要将`ops.GradOperation`中的`get_by_list`设置为`True`。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import mindspore as ms\n", "\n", "class GradNet(nn.Cell):\n", " def __init__(self, net):\n", " super(GradNet, self).__init__()\n", " self.net = net\n", " self.params = ms.ParameterTuple(net.trainable_params())\n", " self.grad_op = ops.GradOperation(get_by_list=True) # 设置对权重参数进行一阶求导\n", "\n", " def construct(self, x):\n", " gradient_function = self.grad_op(self.net, self.params)\n", " return gradient_function(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "接下来,对函数进行求导:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "wgrad: [100.]\n", "bgrad: [1.]\n" ] } ], "source": [ "# 对函数进行求导计算\n", "x = ms.Tensor([100], dtype=ms.float32)\n", "fx = GradNet(Net())(x)\n", "\n", "# 打印结果\n", "print(f\"wgrad: {fx[0]}\\nbgrad: {fx[1]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "若某些权重不需要进行求导,则在定义求导网络时,相应的权重参数声明定义的时候,将其属性`requires_grad`需设置为`False`。" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Tensor(shape=[1], dtype=Float32, value= [ 5.00000000e+00]),)\n" ] } ], "source": [ "class Net(nn.Cell):\n", " def __init__(self):\n", " super(Net, self).__init__()\n", " self.w = ms.Parameter(ms.Tensor(np.array([6], np.float32)), name='w')\n", " self.b = ms.Parameter(ms.Tensor(np.array([1.0], np.float32)), name='b', requires_grad=False)\n", "\n", " def construct(self, x):\n", " out = x * self.w + self.b\n", " return out\n", "\n", "class GradNet(nn.Cell):\n", " def __init__(self, net):\n", " super(GradNet, self).__init__()\n", " self.net = net\n", " self.params = ms.ParameterTuple(net.trainable_params())\n", " self.grad_op = ops.GradOperation(get_by_list=True)\n", "\n", " def construct(self, x):\n", " gradient_function = self.grad_op(self.net, self.params)\n", " return gradient_function(x)\n", "\n", "# 构建求导网络\n", "x = ms.Tensor([5], dtype=ms.float32)\n", "fw = GradNet(Net())(x)\n", "\n", "print(fw)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 梯度值缩放\n", "\n", "通过`sens_param`参数对网络的输出值做缩放以改变最终梯度。首先将`ops.GradOperation`中的`sens_param`设置为`True`,并确定缩放指数,其维度与输出维度保持一致。" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.6]\n" ] } ], "source": [ "class GradNet(nn.Cell):\n", " def __init__(self, net):\n", " super(GradNet, self).__init__()\n", " self.net = net\n", " # 求导操作\n", " self.grad_op = ops.GradOperation(sens_param=True)\n", " # 缩放指数\n", " self.grad_wrt_output = ms.Tensor([0.1], dtype=ms.float32)\n", "\n", " def construct(self, x):\n", " gradient_function = self.grad_op(self.net)\n", " return gradient_function(x, self.grad_wrt_output)\n", "\n", "x = ms.Tensor([6], dtype=ms.float32)\n", "output = GradNet(Net())(x)\n", "\n", "print(output)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 停止计算梯度\n", "\n", "使用`ops.stop_gradient`可以停止计算梯度,示例如下:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "wgrad: [0.]\n", "bgrad: [0.]\n" ] } ], "source": [ "from mindspore.ops import stop_gradient\n", "\n", "class Net(nn.Cell):\n", " def __init__(self):\n", " super(Net, self).__init__()\n", " self.w = ms.Parameter(ms.Tensor(np.array([6], np.float32)), name='w')\n", " self.b = ms.Parameter(ms.Tensor(np.array([1.0], np.float32)), name='b')\n", "\n", " def construct(self, x):\n", " out = x * self.w + self.b\n", " # 停止梯度更新,out对梯度计算无贡献\n", " out = stop_gradient(out)\n", " return out\n", "\n", "class GradNet(nn.Cell):\n", " def __init__(self, net):\n", " super(GradNet, self).__init__()\n", " self.net = net\n", " self.params = ms.ParameterTuple(net.trainable_params())\n", " self.grad_op = ops.GradOperation(get_by_list=True)\n", "\n", " def construct(self, x):\n", " gradient_function = self.grad_op(self.net, self.params)\n", " return gradient_function(x)\n", "\n", "x = ms.Tensor([100], dtype=ms.float32)\n", "output = GradNet(Net())(x)\n", "\n", "print(f\"wgrad: {output[0]}\\nbgrad: {output[1]}\")" ] } ], "metadata": { "kernelspec": { "display_name": "MindSpore", "language": "python", "name": "mindspore" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" } }, "nbformat": 4, "nbformat_minor": 4 }