{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 模型训练\n", "\n", "[![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/resource/_static/logo_notebook.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r1.8/tutorials/zh_cn/beginner/mindspore_train.ipynb) [![下载样例代码](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/resource/_static/logo_download_code.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r1.8/tutorials/zh_cn/beginner/mindspore_train.py) [![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/r1.8/tutorials/source_zh_cn/beginner/train.ipynb)\n", "\n", "通过上面章节的学习,我们已经学会如何创建模型和构建数据集,现在开始学习如何设置超参和优化模型参数。\n", "\n", "## 超参(Hyper-parametric)\n", "\n", "超参是可以调整的参数,可以控制模型训练优化的过程,不同的超参数值可能会影响模型训练和收敛速度。目前深度学习模型多采用批量随机梯度下降算法进行优化,随机梯度下降算法的原理如下:\n", "\n", "$$w_{t+1}=w_{t}-\\eta \\frac{1}{n} \\sum_{x \\in \\mathcal{B}} \\nabla l\\left(x, w_{t}\\right)$$\n", "\n", "式中,$n$是批量大小(batch size),$η$是学习率(learning rate);另外,$w_{t}$为训练轮次t中权重参数,$\\nabla l$为损失函数的导数。可知道除了梯度本身,这两个因子直接决定了模型的权重更新,从优化本身来看它们是影响模型性能收敛最重要的参数。一般会定义以下超参用于训练:\n", "\n", "训练轮次(epoch):训练时遍历数据集的次数。\n", "\n", "批次大小(batch size):数据集进行分批读取训练,设定每个批次数据的大小。batch size过小,花费时间多,同时梯度震荡严重,不利于收敛;batch size过大,不同batch的梯度方向没有任何变化,容易陷入局部极小值,因此需要选择合适的batch size,可以有效提高模型精度、全局收敛。\n", "\n", "学习率(learning rate):如果学习率偏小,会导致收敛的速度变慢,如果学习率偏大则可能会导致训练不收敛等不可预测的结果。梯度下降法是一个广泛被用来最小化模型误差的参数优化算法。梯度下降法通过多次迭代,并在每一步中最小化损失函数来估计模型的参数。学习率就是在迭代过程中,会控制模型的学习进度。\n", "\n", "![learning-rate](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/tutorials/source_zh_cn/beginner/images/learning_rate.png)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "epochs = 10\n", "batch_size = 32\n", "momentum = 0.9\n", "learning_rate = 1e-2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 损失函数\n", "\n", "**损失函数**用来评价模型的**预测值**和**目标值**之间的误差,在这里,使用绝对误差损失函数`L1Loss`:\n", "\n", "$$\\text { L1 Loss Function }=\\sum_{i=1}^{n}\\left|y_{true}-y_{predicted}\\right|$$\n", "\n", "`mindspore.nn.loss`也提供了许多其他常用的损失函数,如`SoftmaxCrossEntropyWithLogits`、`MSELoss`、`SmoothL1Loss`等。\n", "\n", "我们给定预测值和目标值,通过损失函数计算预测值和目标值之间的误差(损失值),使用方法如下所示:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.5\n" ] } ], "source": [ "import numpy as np\n", "import mindspore.nn as nn\n", "import mindspore as ms\n", "\n", "loss = nn.L1Loss()\n", "output_data = ms.Tensor(np.array([[1, 2, 3], [2, 3, 4]]).astype(np.float32))\n", "target_data = ms.Tensor(np.array([[0, 2, 5], [3, 1, 1]]).astype(np.float32))\n", "\n", "print(loss(output_data, target_data))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 优化器函数\n", "\n", "优化器函数用于计算和更新梯度,模型优化算法的选择直接关系到最终模型的性能。有时候最终模型效果不好,未必是特征或者模型设计的问题,很有可能是优化算法的问题。\n", "\n", "MindSpore所有优化逻辑都封装在`Optimizer`对象中,在这里,我们使用`Momentum`优化器。`mindspore.nn`也提供了许多其他常用的优化器函数,如`Adam`、`SGD`、`RMSProp`等。\n", "\n", "我们需要构建一个`Optimizer`对象,这个对象能够基于计算得到的梯度对参数进行更新。为了构建一个`Optimizer`,需要给它一个包含可优化的参数,如网络中所有可以训练的`parameter`,即设置优化器的入参为`net.trainable_params()`。\n", "\n", "然后,设置`Optimizer`的参数选项,比如学习率、权重衰减等。代码样例如下:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from mindspore import nn\n", "from mindvision.classification.models import lenet\n", "\n", "net = lenet(num_classes=10, pretrained=False)\n", "optim = nn.Momentum(net.trainable_params(), learning_rate, momentum)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 模型训练\n", "\n", "模型训练一般分为四个步骤:\n", "\n", "1. 构建数据集。\n", "2. 定义神经网络。\n", "3. 定义超参、损失函数及优化器。\n", "4. 输入训练轮次和数据集进行训练。\n", "\n", "模型训练示例代码如下:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch:[ 0/ 10], step:[ 1875/ 1875], loss:[0.189/1.176], time:2.254 ms, lr:0.01000\n", "Epoch time: 4286.163 ms, per step time: 2.286 ms, avg loss: 1.176\n", "Epoch:[ 1/ 10], step:[ 1875/ 1875], loss:[0.085/0.080], time:1.895 ms, lr:0.01000\n", "Epoch time: 4064.532 ms, per step time: 2.168 ms, avg loss: 0.080\n", "Epoch:[ 2/ 10], step:[ 1875/ 1875], loss:[0.021/0.054], time:1.901 ms, lr:0.01000\n", "Epoch time: 4194.333 ms, per step time: 2.237 ms, avg loss: 0.054\n", "Epoch:[ 3/ 10], step:[ 1875/ 1875], loss:[0.284/0.041], time:2.130 ms, lr:0.01000\n", "Epoch time: 4252.222 ms, per step time: 2.268 ms, avg loss: 0.041\n", "Epoch:[ 4/ 10], step:[ 1875/ 1875], loss:[0.003/0.032], time:2.176 ms, lr:0.01000\n", "Epoch time: 4216.039 ms, per step time: 2.249 ms, avg loss: 0.032\n", "Epoch:[ 5/ 10], step:[ 1875/ 1875], loss:[0.003/0.027], time:2.205 ms, lr:0.01000\n", "Epoch time: 4400.771 ms, per step time: 2.347 ms, avg loss: 0.027\n", "Epoch:[ 6/ 10], step:[ 1875/ 1875], loss:[0.000/0.024], time:1.973 ms, lr:0.01000\n", "Epoch time: 4554.252 ms, per step time: 2.429 ms, avg loss: 0.024\n", "Epoch:[ 7/ 10], step:[ 1875/ 1875], loss:[0.008/0.022], time:2.048 ms, lr:0.01000\n", "Epoch time: 4361.135 ms, per step time: 2.326 ms, avg loss: 0.022\n", "Epoch:[ 8/ 10], step:[ 1875/ 1875], loss:[0.000/0.018], time:2.130 ms, lr:0.01000\n", "Epoch time: 4547.597 ms, per step time: 2.425 ms, avg loss: 0.018\n", "Epoch:[ 9/ 10], step:[ 1875/ 1875], loss:[0.008/0.017], time:2.135 ms, lr:0.01000\n", "Epoch time: 4601.861 ms, per step time: 2.454 ms, avg loss: 0.017\n" ] } ], "source": [ "import mindspore.nn as nn\n", "import mindspore as ms\n", "\n", "from mindvision.classification.dataset import Mnist\n", "from mindvision.classification.models import lenet\n", "from mindvision.engine.callback import LossMonitor\n", "\n", "# 1. 构建数据集\n", "download_train = Mnist(path=\"./mnist\", split=\"train\", batch_size=batch_size, repeat_num=1, shuffle=True, resize=32, download=True)\n", "dataset_train = download_train.run()\n", "\n", "# 2. 定义神经网络\n", "network = lenet(num_classes=10, pretrained=False)\n", "# 3.1 定义损失函数\n", "net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')\n", "# 3.2 定义优化器函数\n", "net_opt = nn.Momentum(network.trainable_params(), learning_rate=learning_rate, momentum=momentum)\n", "# 3.3 初始化模型参数\n", "model = ms.Model(network, loss_fn=net_loss, optimizer=net_opt, metrics={'acc'})\n", "\n", "# 4. 对神经网络执行训练\n", "model.train(epochs, dataset_train, callbacks=[LossMonitor(learning_rate, 1875)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "训练过程中会打印loss值,loss值会波动,但总体来说loss值会逐步减小,精度逐步提高。每个人运行的loss值有一定随机性,不一定完全相同。" ] } ], "metadata": { "kernelspec": { "display_name": "MindSpore", "language": "python", "name": "mindspore" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }