{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 网络参数\n", "\n", "`Ascend` `GPU` `CPU` `模型开发`\n", "\n", "[![在线运行](https://gitee.com/mindspore/docs/raw/r1.6/resource/_static/logo_modelarts.png)](https://authoring-modelarts-cnnorth4.huaweicloud.com/console/lab?share-url-b64=aHR0cHM6Ly9taW5kc3BvcmUtd2Vic2l0ZS5vYnMuY24tbm9ydGgtNC5teWh1YXdlaWNsb3VkLmNvbS9ub3RlYm9vay9tb2RlbGFydHMvcHJvZ3JhbW1pbmdfZ3VpZGUvbWluZHNwb3JlX3BhcmFtZXRlci5pcHluYg==&imageid=65f636a0-56cf-49df-b941-7d2a07ba8c8c) [![下载Notebook](https://gitee.com/mindspore/docs/raw/r1.6/resource/_static/logo_notebook.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r1.6/programming_guide/zh_cn/mindspore_parameter.ipynb) [![下载样例代码](https://gitee.com/mindspore/docs/raw/r1.6/resource/_static/logo_download_code.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r1.6/programming_guide/zh_cn/mindspore_parameter.py) [![查看源文件](https://gitee.com/mindspore/docs/raw/r1.6/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/r1.6/docs/mindspore/programming_guide/source_zh_cn/parameter.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 概述\n", "\n", "MindSpore提供了网络参数初始化模块,用户可以通过封装算子来调用字符串、Initializer子类或自定义Tensor等方式完成对网络参数进行初始化。本章主要介绍了`Parameter`的初始化以及属性和方法的使用,同时介绍了`ParameterTuple`和参数的依赖控制。\n", "\n", "## Parameter\n", "\n", "`Parameter`是变量张量,代表在训练网络时,需要被更新的参数。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 初始化\n", "\n", "```python\n", "mindspore.Parameter(default_input, name, requires_grad=True, layerwise_parallel=False)\n", "```\n", "\n", "- `default_input`: 初始化一个`Parameter`对象,传入的数据支持`Tensor`、`Initializer`、`int`和`float`四种类型。`Initializer`是初始化器,可调用`initializer`接口生成`Initializer`对象。当使用`init`去初始化`Tensor`时,`Tensor`仅保存张量的形状和类型,而不保存实际数据,所以不会占用任何内存,可调用`init_data`接口将`Parameter`里保存的`Tensor`转化为数据。\n", "\n", "- `name`: 可为每个`Parameter`指定一个名称,便于后续操作和更新。如果在Cell里初始化一个Parameter作为Cell的属性时,建议使用默认值None,否则可能会出现Parameter的name与预期的不一致的情况。\n", "\n", "- `requires_grad`: 当参数需要被更新时,需要将`requires_grad`设置为`True`。\n", "\n", "- `layerwise_parallel`: 当`layerwise_parallel`(混合并行)配置为`True`时,参数广播和参数梯度聚合时会过滤掉该参数。\n", "\n", "有关分布式并行的相关配置,可以参考文档:https://www.mindspore.cn/docs/programming_guide/zh-CN/r1.6/auto_parallel.html 。\n", "\n", "下例通过三种不同的数据类型构造了`Parameter`,三个`Parameter`都需要更新,都不采用layerwise并行。\n", "\n", "代码样例如下:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Parameter (name=x, shape=(2, 3), dtype=Int64, requires_grad=True) \n", "\n", " Parameter (name=y, shape=(1, 2, 3), dtype=Float32, requires_grad=True) \n", "\n", " Parameter (name=z, shape=(), dtype=Float32, requires_grad=True)\n" ] } ], "source": [ "import numpy as np\n", "from mindspore import Tensor, Parameter\n", "from mindspore import dtype as mstype\n", "from mindspore.common.initializer import initializer\n", "\n", "x = Parameter(default_input=Tensor(np.arange(2*3).reshape((2, 3))), name=\"x\")\n", "y = Parameter(default_input=initializer('ones', [1, 2, 3], mstype.float32), name='y')\n", "z = Parameter(default_input=2.0, name='z')\n", "\n", "print(x, \"\\n\\n\", y, \"\\n\\n\", z)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 属性\n", "\n", "- `inited_param`:返回保存了实际数据的`Parameter`。\n", "\n", "- `name`:实例化`Parameter`时,为其指定的名字。\n", "\n", "- `sliced`:用在自动并行场景下,表示`Parameter`里保存的数据是否是分片数据。\n", "\n", " 如果是,就不再对其进行切分,如果不是,需要根据网络并行策略确认是否对其进行切分。\n", "\n", "- `is_init`:`Parameter`的初始化状态。在GE后端,`Parameter`需要一个`init graph`来从主机同步数据到设备侧,该标志表示数据是否已同步到设备。 此标志仅在GE后端起作用,其他后端将被设置为False。\n", "\n", "- `layerwise_parallel`:`Parameter`是否支持layerwise并行。如果支持,参数就不会进行广播和梯度聚合,反之则需要。\n", "\n", "- `requires_grad`:是否需要计算参数梯度。如果参数需要被训练,则需要计算参数梯度,否则不需要。\n", "\n", "- `data`: `Parameter`本身。\n", "\n", "下例通过`Tensor`初始化一个`Parameter`,获取了`Parameter`的相关属性。如下:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " name: x \n", " sliced: False \n", " is_init: False \n", " inited_param: None \n", " requires_grad: True \n", " layerwise_parallel: False \n", " data: Parameter (name=x, shape=(2, 3), dtype=Int64, requires_grad=True)\n" ] } ], "source": [ "import numpy as np\n", "\n", "from mindspore import Tensor, Parameter\n", "\n", "x = Parameter(default_input=Tensor(np.arange(2*3).reshape((2, 3))), name=\"x\")\n", "\n", "print(\"name: \", x.name, \"\\n\",\n", " \"sliced: \", x.sliced, \"\\n\",\n", " \"is_init: \", x.is_init, \"\\n\",\n", " \"inited_param: \", x.inited_param, \"\\n\",\n", " \"requires_grad: \", x.requires_grad, \"\\n\",\n", " \"layerwise_parallel: \", x.layerwise_parallel, \"\\n\",\n", " \"data: \", x.data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 方法\n", "\n", "- `init_data`:在网络采用半自动或者全自动并行策略的场景下, 当初始化`Parameter`传入的数据是`Initializer`时,可调用该接口将`Parameter`保存的数据转换为`Tensor`。\n", "\n", "- `set_data`:设置`Parameter`保存的数据,支持传入`Tensor`、`Initializer`、`int`和`float`进行设置, 将方法的入参`slice_shape`设置为True时,可改变`Parameter`的shape,反之,设置的数据shape必须与`Parameter`原来的shape保持一致。\n", "\n", "- `set_param_ps`:控制训练参数是否通过[Parameter Server](https://www.mindspore.cn/docs/programming_guide/zh-CN/r1.6/apply_parameter_server_training.html)进行训练。\n", "\n", "- `clone`:克隆`Parameter`,克隆完成后可以给新Parameter指定新的名字。\n", "\n", "下例通过`Initializer`来初始化`Tensor`,调用了`Parameter`的相关方法。如下:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parameter (name=Parameter, shape=(1, 2, 3), dtype=Float32, requires_grad=True)\n", "Parameter (name=x_clone, shape=(1, 2, 3), dtype=Float32, requires_grad=True)\n", "Parameter (name=Parameter, shape=(1, 2, 3), dtype=Float32, requires_grad=True)\n", "Parameter (name=Parameter, shape=(1, 2, 3), dtype=Float32, requires_grad=True)\n" ] } ], "source": [ "import numpy as np\n", "from mindspore import Tensor, Parameter\n", "from mindspore import dtype as mstype\n", "from mindspore.common.initializer import initializer\n", "\n", "x = Parameter(default_input=initializer('ones', [1, 2, 3], mstype.float32))\n", "\n", "print(x)\n", "x_clone = x.clone()\n", "x_clone.name = \"x_clone\"\n", "print(x_clone)\n", "\n", "print(x.init_data())\n", "print(x.set_data(data=Tensor(np.arange(2*3).reshape((1, 2, 3)))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ParameterTuple\n", "\n", "继承于`tuple`,用于保存多个`Parameter`,通过`__new__(cls, iterable)`传入一个存放`Parameter`的迭代器进行构造,提供`clone`接口进行克隆。\n", "\n", "下例构造了一个`ParameterTuple`对象,并进行了克隆。如下:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Parameter (name=x, shape=(2, 3), dtype=Int64, requires_grad=True), Parameter (name=y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=z, shape=(), dtype=Float32, requires_grad=True)) \n", "\n", "(Parameter (name=params_copy.x, shape=(2, 3), dtype=Int64, requires_grad=True), Parameter (name=params_copy.y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=params_copy.z, shape=(), dtype=Float32, requires_grad=True))\n" ] } ], "source": [ "import numpy as np\n", "from mindspore import Tensor, Parameter, ParameterTuple\n", "from mindspore import dtype as mstype\n", "from mindspore.common.initializer import initializer\n", "\n", "x = Parameter(default_input=Tensor(np.arange(2*3).reshape((2, 3))), name=\"x\")\n", "y = Parameter(default_input=initializer('ones', [1, 2, 3], mstype.float32), name='y')\n", "z = Parameter(default_input=2.0, name='z')\n", "params = ParameterTuple((x, y, z))\n", "params_copy = params.clone(\"params_copy\")\n", "print(params, \"\\n\")\n", "print(params_copy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 使用封装算子对参数初始化\n", "\n", "MindSpore提供了多种参数初始化的方式,并在部分算子中封装了参数初始化的功能。本节将介绍带有参数初始化功能的算子对参数进行初始化的方法,以`Conv2d`算子为例,分别介绍以字符串,`Initializer`子类和自定义`Tensor`等方式对网络中的参数进行初始化,以下代码示例中均以`Initializer`的子类`Normal`为例,代码示例中`Normal`均可替换成`Initializer`子类中任何一个。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 字符串\n", "\n", "使用字符串对网络参数进行初始化,字符串的内容需要与`Initializer`子类的名称保持一致(字母不区分大小写),使用字符串方式进行初始化将使用`Initializer`子类中的默认参数,例如使用字符串`Normal`等同于使用`Initializer`的子类`Normal()`,代码样例如下:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[[[ 3.10382620e-02 4.38603461e-02 4.38603461e-02 ... 4.38603461e-02\n", " 4.38603461e-02 1.38719045e-02]\n", " [ 3.26051228e-02 3.54298912e-02 3.54298912e-02 ... 3.54298912e-02\n", " 3.54298912e-02 -5.54019120e-03]\n", " [ 3.26051228e-02 3.54298912e-02 3.54298912e-02 ... 3.54298912e-02\n", " 3.54298912e-02 -5.54019120e-03]\n", " ...\n", " [ 3.26051228e-02 3.54298912e-02 3.54298912e-02 ... 3.54298912e-02\n", " 3.54298912e-02 -5.54019120e-03]\n", " [ 3.26051228e-02 3.54298912e-02 3.54298912e-02 ... 3.54298912e-02\n", " 3.54298912e-02 -5.54019120e-03]\n", " [ 9.66199022e-03 1.24104535e-02 1.24104535e-02 ... 1.24104535e-02\n", " 1.24104535e-02 -1.38977719e-02]]\n", "\n", " ...\n", "\n", " [[ 3.98553275e-02 -1.35465711e-03 -1.35465711e-03 ... -1.35465711e-03\n", " -1.35465711e-03 -1.00310734e-02]\n", " [ 4.38403059e-03 -3.60766202e-02 -3.60766202e-02 ... -3.60766202e-02\n", " -3.60766202e-02 -2.95619294e-02]\n", " [ 4.38403059e-03 -3.60766202e-02 -3.60766202e-02 ... -3.60766202e-02\n", " -3.60766202e-02 -2.95619294e-02]\n", " ...\n", " [ 4.38403059e-03 -3.60766202e-02 -3.60766202e-02 ... -3.60766202e-02\n", " -3.60766202e-02 -2.95619294e-02]\n", " [ 4.38403059e-03 -3.60766202e-02 -3.60766202e-02 ... -3.60766202e-02\n", " -3.60766202e-02 -2.95619294e-02]\n", " [ 1.33139016e-02 6.74417242e-05 6.74417242e-05 ... 6.74417242e-05\n", " 6.74417242e-05 -2.27325838e-02]]]]\n" ] } ], "source": [ "import numpy as np\n", "import mindspore.nn as nn\n", "from mindspore import Tensor\n", "from mindspore import set_seed\n", "\n", "set_seed(1)\n", "\n", "input_data = Tensor(np.ones([1, 3, 16, 50], dtype=np.float32))\n", "net = nn.Conv2d(3, 64, 3, weight_init='Normal')\n", "output = net(input_data)\n", "print(output)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initializer子类\n", "\n", "使用`Initializer`子类对网络参数进行初始化,与使用字符串对参数进行初始化的效果类似,不同的是使用字符串进行参数初始化是使用`Initializer`子类的默认参数,如要使用`Initializer`子类中的参数,就必须使用`Initializer`子类的方式对参数进行初始化,以`Normal(0.2)`为例,代码样例如下:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[[[ 6.2076533e-01 8.7720710e-01 8.7720710e-01 ... 8.7720710e-01\n", " 8.7720710e-01 2.7743810e-01]\n", " [ 6.5210247e-01 7.0859784e-01 7.0859784e-01 ... 7.0859784e-01\n", " 7.0859784e-01 -1.1080378e-01]\n", " [ 6.5210247e-01 7.0859784e-01 7.0859784e-01 ... 7.0859784e-01\n", " 7.0859784e-01 -1.1080378e-01]\n", " ...\n", " [ 6.5210247e-01 7.0859784e-01 7.0859784e-01 ... 7.0859784e-01\n", " 7.0859784e-01 -1.1080378e-01]\n", " [ 6.5210247e-01 7.0859784e-01 7.0859784e-01 ... 7.0859784e-01\n", " 7.0859784e-01 -1.1080378e-01]\n", " [ 1.9323981e-01 2.4820906e-01 2.4820906e-01 ... 2.4820906e-01\n", " 2.4820906e-01 -2.7795550e-01]]\n", "\n", " ...\n", "\n", " [[ 7.9710668e-01 -2.7093157e-02 -2.7093157e-02 ... -2.7093157e-02\n", " -2.7093157e-02 -2.0062150e-01]\n", " [ 8.7680638e-02 -7.2153252e-01 -7.2153252e-01 ... -7.2153252e-01\n", " -7.2153252e-01 -5.9123868e-01]\n", " [ 8.7680638e-02 -7.2153252e-01 -7.2153252e-01 ... -7.2153252e-01\n", " -7.2153252e-01 -5.9123868e-01]\n", " ...\n", " [ 8.7680638e-02 -7.2153252e-01 -7.2153252e-01 ... -7.2153252e-01\n", " -7.2153252e-01 -5.9123868e-01]\n", " [ 8.7680638e-02 -7.2153252e-01 -7.2153252e-01 ... -7.2153252e-01\n", " -7.2153252e-01 -5.9123868e-01]\n", " [ 2.6627803e-01 1.3488382e-03 1.3488382e-03 ... 1.3488382e-03\n", " 1.3488382e-03 -4.5465171e-01]]]]\n" ] } ], "source": [ "import numpy as np\n", "import mindspore.nn as nn\n", "from mindspore import Tensor\n", "from mindspore import set_seed\n", "from mindspore.common.initializer import Normal\n", "\n", "set_seed(1)\n", "\n", "input_data = Tensor(np.ones([1, 3, 16, 50], dtype=np.float32))\n", "net = nn.Conv2d(3, 64, 3, weight_init=Normal(0.2))\n", "output = net(input_data)\n", "print(output)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 自定义的Tensor\n", "\n", "除上述两种初始化方法外,当网络要使用MindSpore中没有的数据类型对参数进行初始化,用户可以通过自定义`Tensor`的方式来对参数进行初始化,代码样例如下:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[[[12. 18. 18. ... 18. 18. 12.]\n", " [18. 27. 27. ... 27. 27. 18.]\n", " [18. 27. 27. ... 27. 27. 18.]\n", " ...\n", " [18. 27. 27. ... 27. 27. 18.]\n", " [18. 27. 27. ... 27. 27. 18.]\n", " [12. 18. 18. ... 18. 18. 12.]]\n", "\n", " ...\n", "\n", " [[12. 18. 18. ... 18. 18. 12.]\n", " [18. 27. 27. ... 27. 27. 18.]\n", " [18. 27. 27. ... 27. 27. 18.]\n", " ...\n", " [18. 27. 27. ... 27. 27. 18.]\n", " [18. 27. 27. ... 27. 27. 18.]\n", " [12. 18. 18. ... 18. 18. 12.]]]]\n" ] } ], "source": [ "import numpy as np\n", "import mindspore.nn as nn\n", "from mindspore import Tensor\n", "from mindspore import dtype as mstype\n", "\n", "weight = Tensor(np.ones([64, 3, 3, 3]), dtype=mstype.float32)\n", "input_data = Tensor(np.ones([1, 3, 16, 50], dtype=np.float32))\n", "net = nn.Conv2d(3, 64, 3, weight_init=weight)\n", "output = net(input_data)\n", "print(output)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 依赖控制\n", "\n", "如果函数的运行结果依赖或影响外部状态,我们认为该函数具有副作用,比如函数会改变外部全局变量、函数的结果依赖全局变量的值。如果操作符会改变输入参数的值或者操作符的输出依赖全局参数的值,我们认为这是带副作用的操作符。\n", "\n", "根据内存属性和IO状态,将副作用划分为内存副作用和IO副作用。当前内存副作用主要有Assign、优化器算子等等,IO副作用主要有Print算子。详细可以查看算子定义,内存副作用算子在定义中有side_effect_mem属性,IO副作用算子在定义中有side_effect_io属性。\n", "\n", "Depend用于处理依赖项操作。\n", "在大多数情况下,如果操作符有IO副作用或内存副作用,则将根据用户的语义执行它们,不需要另外使用Depend算子来保证执行顺序。在某些情况下,如果两个运算符A和B没有顺序依赖关系,并且A必须在B之前执行,我们建议使用Depend指定它们的执行顺序。使用方法如下:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```py\n", "a = A(x) ---> a = A(x)\n", "b = B(y) ---> y = Depend(y, a)\n", " ---> b = B(y)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "值得说明的是,用于浮点数溢出状态检测的一组特殊算子它们存在隐含副作用,但又不属于IO副作用或内存副作用。此外,使用时还有严格的顺序要求,即:在使用NPUClearFloatStatus算子前需要保证NPUAllocFloatStatus已经执行,使用NPUGetFloatStatus算子前需要保证NPUClearFloatStatus已经执行。因为这些算子使用较少,目前的方案是保持它们的定义为无副作用形式,以Depend确保执行顺序。这些算子仅在Ascend上支持。如下:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[5. 5. 5.]\n", " [5. 5. 5.]\n", " [5. 5. 5.]]" ] } ], "source": [ "import numpy as np\n", "from mindspore import Tensor\n", "from mindspore import ops, context\n", "\n", "context.set_context(device_target=\"Ascend\")\n", "\n", "npu_alloc_status = ops.NPUAllocFloatStatus()\n", "npu_get_status = ops.NPUGetFloatStatus()\n", "npu_clear_status = ops.NPUClearFloatStatus()\n", "x = Tensor(np.ones([3, 3]).astype(np.float32))\n", "y = Tensor(np.ones([3, 3]).astype(np.float32))\n", "init = npu_alloc_status()\n", "sum_ = ops.Add()(x, y)\n", "product = ops.MatMul()(x, y)\n", "init = ops.depend(init, sum_)\n", "init = ops.depend(init, product)\n", "get_status = npu_get_status(init)\n", "sum_ = ops.depend(sum_, get_status)\n", "product = ops.depend(product, get_status)\n", "out = ops.Add()(sum_, product)\n", "init = ops.depend(init, out)\n", "clear = npu_clear_status(init)\n", "out = ops.depend(out, clear)\n", "print(out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "具体使用可参考溢出检测逻辑中[start_overflow_check函数](https://gitee.com/mindspore/mindspore/blob/r1.6/mindspore/python/mindspore/nn/wrap/loss_scale.py)的实现。" ] } ], "metadata": { "kernelspec": { "display_name": "MindSpore", "language": "python", "name": "mindspore" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }