{ "cells": [ { "cell_type": "markdown", "id": "fa7e3e52", "metadata": {}, "source": [ "# ResNet50图像分类\n", "\n", "[![在线运行](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/resource/_static/logo_modelarts.png)](https://authoring-modelarts-cnnorth4.huaweicloud.com/console/lab?share-url-b64=aHR0cHM6Ly9vYnMuZHVhbHN0YWNrLmNuLW5vcnRoLTQubXlodWF3ZWljbG91ZC5jb20vbWluZHNwb3JlLXdlYnNpdGUvbm90ZWJvb2svcjIuMC4wLWFscGhhL3R1dG9yaWFscy9hcHBsaWNhdGlvbi96aF9jbi9jdi9taW5kc3BvcmVfcmVzbmV0NTAuaXB5bmI=&imageid=77ef960a-bd26-4de4-9695-5b85a786fb90) [![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/resource/_static/logo_notebook.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r2.0.0-alpha/tutorials/application/zh_cn/cv/mindspore_resnet50.ipynb) [![下载样例代码](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/resource/_static/logo_download_code.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r2.0.0-alpha/tutorials/application/zh_cn/cv/mindspore_resnet50.py) [![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/r2.0.0-alpha/tutorials/application/source_zh_cn/cv/resnet50.ipynb)\n", "\n", "图像分类是最基础的计算机视觉应用,属于有监督学习类别,如给定一张图像(猫、狗、飞机、汽车等等),判断图像所属的类别。本章将介绍使用ResNet50网络对CIFAR-10数据集进行分类。\n", "\n", "## ResNet网络介绍\n", "\n", "ResNet50网络是2015年由微软实验室的何恺明提出,获得ILSVRC2015图像分类竞赛第一名。在ResNet网络提出之前,传统的卷积神经网络都是将一系列的卷积层和池化层堆叠得到的,但当网络堆叠到一定深度时,就会出现退化问题。下图是在CIFAR-10数据集上使用56层网络与20层网络训练误差和测试误差图,由图中数据可以看出,56层网络比20层网络训练误差和测试误差更大,随着网络的加深,其误差并没有如预想的一样减小。\n", "\n", "![resnet-1](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/tutorials/application/source_zh_cn/cv/images/resnet_1.png)\n", "\n", "ResNet网络提出了残差网络结构(Residual Network)来减轻退化问题,使用ResNet网络可以实现搭建较深的网络结构(突破1000层)。论文中使用ResNet网络在CIFAR-10数据集上的训练误差与测试误差图如下图所示,图中虚线表示训练误差,实线表示测试误差。由图中数据可以看出,ResNet网络层数越深,其训练误差和测试误差越小。\n", "\n", "![resnet-4](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/tutorials/application/source_zh_cn/cv/images/resnet_4.png)\n", "\n", "> 了解ResNet网络更多详细内容,参见[ResNet论文](https://arxiv.org/pdf/1512.03385.pdf)。" ] }, { "cell_type": "markdown", "id": "a987ee48", "metadata": {}, "source": [ "## 数据集准备与加载\n", "\n", "[CIFAR-10数据集](http://www.cs.toronto.edu/~kriz/cifar.html)共有60000张32*32的彩色图像,分为10个类别,每类有6000张图,数据集一共有50000张训练图片和10000张评估图片。首先,如下示例使用`download`接口下载并解压,目前仅支持解析二进制版本的CIFAR-10文件(CIFAR-10 binary version)。" ] }, { "cell_type": "code", "execution_count": 3, "id": "1f9b81fb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Creating data folder...\n", "Downloading data from https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/cifar-10-binary.tar.gz (162.2 MB)\n", "\n", "file_sizes: 100%|████████████████████████████| 170M/170M [00:08<00:00, 20.6MB/s]\n", "Extracting tar.gz file...\n", "Successfully downloaded / unzipped to ./datasets-cifar10-bin\n" ] }, { "data": { "text/plain": [ "'./datasets-cifar10-bin'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from download import download\n", "\n", "url = \"https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/cifar-10-binary.tar.gz\"\n", "\n", "download(url, \"./datasets-cifar10-bin\", kind=\"tar.gz\")" ] }, { "cell_type": "markdown", "id": "7e9020ba", "metadata": {}, "source": [ "下载后的数据集目录结构如下:\n", "\n", "```Text\n", "datasets-cifar10-bin/cifar-10-batches-bin\n", "├── batches.meta.text\n", "├── data_batch_1.bin\n", "├── data_batch_2.bin\n", "├── data_batch_3.bin\n", "├── data_batch_4.bin\n", "├── data_batch_5.bin\n", "├── readme.html\n", "└── test_batch.bin\n", "\n", "```\n", "\n", "然后,使用`mindspore.dataset.Cifar10Dataset`接口来加载数据集,并进行相关图像增强操作。" ] }, { "cell_type": "code", "execution_count": 8, "id": "df7fb621", "metadata": {}, "outputs": [], "source": [ "import mindspore.dataset as ds\n", "import mindspore.dataset.vision as vision\n", "import mindspore.dataset.transforms as transforms\n", "import mindspore as ms\n", "import numpy as np\n", "\n", "from mindspore import dtype as mstype\n", "from mindspore import nn\n", "\n", "\n", "data_dir = \"./datasets-cifar10-bin/cifar-10-batches-bin\" # 数据集根目录\n", "batch_size = 256 # 批量大小\n", "image_size = 32 # 训练图像空间大小\n", "workers = 4 # 并行线程个数\n", "num_classes = 10 # 分类数量\n", "\n", "def create_dataset_cifar10(dataset_dir, usage, resize, batch_size, workers):\n", "\n", " data_set = ds.Cifar10Dataset(dataset_dir=dataset_dir,\n", " usage=usage,\n", " num_parallel_workers=workers,\n", " shuffle=True)\n", "\n", " trans = []\n", " if usage == \"train\":\n", " trans += [\n", " vision.RandomCrop((32, 32), (4, 4, 4, 4)),\n", " vision.RandomHorizontalFlip(prob=0.5)\n", " ]\n", "\n", " trans += [\n", " vision.Resize(resize),\n", " vision.Rescale(1.0 / 255.0, 0.0),\n", " vision.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]),\n", " vision.HWC2CHW()\n", " ]\n", "\n", " target_trans = transforms.TypeCast(mstype.int32)\n", "\n", " # 数据映射操作\n", " data_set = data_set.map(\n", " operations=trans,\n", " input_columns='image',\n", " num_parallel_workers=workers)\n", "\n", " data_set = data_set.map(\n", " operations=target_trans,\n", " input_columns='label',\n", " num_parallel_workers=workers)\n", "\n", " # 批量操作\n", " data_set = data_set.batch(batch_size)\n", "\n", "\n", " return data_set\n", "\n", "\n", "# 获取处理后的训练与测试数据集\n", "\n", "dataset_train = create_dataset_cifar10(dataset_dir=data_dir,\n", " usage=\"train\",\n", " resize=image_size,\n", " batch_size=batch_size,\n", " workers=workers)\n", "step_size_train = dataset_train.get_dataset_size()\n", "index_label_dict = dataset_train.get_class_indexing()\n", "\n", "dataset_val = create_dataset_cifar10(dataset_dir=data_dir,\n", " usage=\"test\",\n", " resize=image_size,\n", " batch_size=batch_size,\n", " workers=workers)\n", "step_size_val = dataset_val.get_dataset_size()" ] }, { "cell_type": "markdown", "id": "21e86f95", "metadata": {}, "source": [ "对CIFAR-10训练数据集进行可视化。" ] }, { "cell_type": "code", "execution_count": 9, "id": "c3ffabb3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Image shape: (6, 3, 32, 32), Label: [9 8 6 0 8 5]\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "data_iter = next(dataset_train.create_dict_iterator())\n", "\n", "images = data_iter[\"image\"].asnumpy()\n", "labels = data_iter[\"label\"].asnumpy()\n", "print(f\"Image shape: {images.shape}, Label: {labels}\")\n", "\n", "classes = []\n", "\n", "with open(data_dir+\"/batches.meta.txt\", \"r\") as f:\n", " for line in f:\n", " line = line.rstrip()\n", " if line != '':\n", " classes.append(line)\n", "\n", "plt.figure()\n", "for i in range(6):\n", " plt.subplot(2, 3, i+1)\n", " image_trans = np.transpose(images[i], (1, 2, 0))\n", " mean = np.array([0.4914, 0.4822, 0.4465])\n", " std = np.array([0.2023, 0.1994, 0.2010])\n", " image_trans = std * image_trans + mean\n", " image_trans = np.clip(image_trans, 0, 1)\n", " plt.title(f\"{classes[labels[i]]}\")\n", " plt.imshow(image_trans)\n", " plt.axis(\"off\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "76c96f76", "metadata": {}, "source": [ "## 构建网络\n", "\n", "残差网络结构(Residual Network)是ResNet网络的主要亮点,ResNet使用残差网络结构后可有效地减轻退化问题,实现更深的网络结构设计,提高网络的训练精度。本节首先讲述如何构建残差网络结构,然后通过堆叠残差网络来构建ResNet50网络。\n", "\n", "### 构建残差网络结构\n", "\n", "残差网络结构图如下图所示,残差网络由两个分支构成:一个主分支,一个shortcuts(图中弧线表示)。主分支通过堆叠一系列的卷积操作得到,shotcuts从输入直接到输出,主分支输出的特征矩阵$F(x)$加上shortcuts输出的特征矩阵$x$得到$F(x)+x$,通过Relu激活函数后即为残差网络最后的输出。\n", "\n", "![residual](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/tutorials/application/source_zh_cn/cv/images/resnet_3.png)\n", "\n", "残差网络结构主要由两种,一种是Building Block,适用于较浅的ResNet网络,如ResNet18和ResNet34;另一种是Bottleneck,适用于层数较深的ResNet网络,如ResNet50、ResNet101和ResNet152。\n", "\n", "#### Building Block\n", "\n", "Building Block结构图如下图所示,主分支有两层卷积网络结构:\n", "\n", "+ 主分支第一层网络以输入channel为64为例,首先通过一个$3\\times3$的卷积层,然后通过Batch Normalization层,最后通过Relu激活函数层,输出channel为64;\n", "+ 主分支第二层网络的输入channel为64,首先通过一个$3\\times3$的卷积层,然后通过Batch Normalization层,输出channel为64。\n", "\n", "最后将主分支输出的特征矩阵与shortcuts输出的特征矩阵相加,通过Relu激活函数即为Building Block最后的输出。\n", "\n", "![building-block-5](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/tutorials/application/source_zh_cn/cv/images/resnet_5.png)\n", "\n", "主分支与shortcuts输出的特征矩阵相加时,需要保证主分支与shortcuts输出的特征矩阵shape相同。如果主分支与shortcuts输出的特征矩阵shape不相同,如输出channel是输入channel的一倍时,shortcuts上需要使用数量与输出channel相等,大小为$1\\times1$的卷积核进行卷积操作;若输出的图像较输入图像缩小一倍,则要设置shortcuts中卷积操作中的`stride`为2,主分支第一层卷积操作的`stride`也需设置为2。\n", "\n", "如下代码定义`ResidualBlockBase`类实现Building Block结构。" ] }, { "cell_type": "code", "execution_count": 5, "id": "c7ac0e2d", "metadata": {}, "outputs": [], "source": [ "from typing import Type, Union, List, Optional\n", "from mindspore import nn, train\n", "from mindspore.common.initializer import Normal\n", "\n", "weight_init = Normal(mean=0, sigma=0.02)\n", "gamma_init = Normal(mean=1, sigma=0.02)\n", "\n", "class ResidualBlockBase(nn.Cell):\n", " expansion: int = 1 # 最后一个卷积核数量与第一个卷积核数量相等\n", "\n", " def __init__(self, in_channel: int, out_channel: int,\n", " stride: int = 1, norm: Optional[nn.Cell] = None,\n", " down_sample: Optional[nn.Cell] = None) -> None:\n", " super(ResidualBlockBase, self).__init__()\n", " if not norm:\n", " self.norm = nn.BatchNorm2d(out_channel)\n", " else:\n", " self.norm = norm\n", "\n", " self.conv1 = nn.Conv2d(in_channel, out_channel,\n", " kernel_size=3, stride=stride,\n", " weight_init=weight_init)\n", " self.conv2 = nn.Conv2d(in_channel, out_channel,\n", " kernel_size=3, weight_init=weight_init)\n", " self.relu = nn.ReLU()\n", " self.down_sample = down_sample\n", "\n", " def construct(self, x):\n", " \"\"\"ResidualBlockBase construct.\"\"\"\n", " identity = x # shortcuts分支\n", "\n", " out = self.conv1(x) # 主分支第一层:3*3卷积层\n", " out = self.norm(out)\n", " out = self.relu(out)\n", " out = self.conv2(out) # 主分支第二层:3*3卷积层\n", " out = self.norm(out)\n", "\n", " if self.down_sample is not None:\n", " identity = self.down_sample(x)\n", " out += identity # 输出为主分支与shortcuts之和\n", " out = self.relu(out)\n", "\n", " return out" ] }, { "cell_type": "markdown", "id": "aaa15d3c", "metadata": {}, "source": [ "#### Bottleneck\n", "\n", "Bottleneck结构图如下图所示,在输入相同的情况下Bottleneck结构相对Building Block结构的参数数量更少,更适合层数较深的网络,ResNet50使用的残差结构就是Bottleneck。该结构的主分支有三层卷积结构,分别为$1\\times1$的卷积层、$3\\times3$卷积层和$1\\times1$的卷积层,其中$1\\times1$的卷积层分别起降维和升维的作用。\n", "\n", "+ 主分支第一层网络以输入channel为256为例,首先通过数量为64,大小为$1\\times1$的卷积核进行降维,然后通过Batch Normalization层,最后通过Relu激活函数层,其输出channel为64;\n", "+ 主分支第二层网络通过数量为64,大小为$3\\times3$的卷积核提取特征,然后通过Batch Normalization层,最后通过Relu激活函数层,其输出channel为64;\n", "+ 主分支第三层通过数量为256,大小$1\\times1$的卷积核进行升维,然后通过Batch Normalization层,其输出channel为256。\n", "\n", "最后将主分支输出的特征矩阵与shortcuts输出的特征矩阵相加,通过Relu激活函数即为Bottleneck最后的输出。\n", "\n", "![building-block-6](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/tutorials/application/source_zh_cn/cv/images/resnet_6.png)\n", "\n", "主分支与shortcuts输出的特征矩阵相加时,需要保证主分支与shortcuts输出的特征矩阵shape相同。如果主分支与shortcuts输出的特征矩阵shape不相同,如输出channel是输入channel的一倍时,shortcuts上需要使用数量与输出channel相等,大小为$1\\times1$的卷积核进行卷积操作;若输出的图像较输入图像缩小一倍,则要设置shortcuts中卷积操作中的`stride`为2,主分支第二层卷积操作的`stride`也需设置为2。\n", "\n", "如下代码定义`ResidualBlock`类实现Bottleneck结构。" ] }, { "cell_type": "code", "execution_count": 6, "id": "0d46f98e", "metadata": {}, "outputs": [], "source": [ "class ResidualBlock(nn.Cell):\n", " expansion = 4 # 最后一个卷积核的数量是第一个卷积核数量的4倍\n", "\n", " def __init__(self, in_channel: int, out_channel: int,\n", " stride: int = 1, down_sample: Optional[nn.Cell] = None) -> None:\n", " super(ResidualBlock, self).__init__()\n", "\n", " self.conv1 = nn.Conv2d(in_channel, out_channel,\n", " kernel_size=1, weight_init=weight_init)\n", " self.norm1 = nn.BatchNorm2d(out_channel)\n", " self.conv2 = nn.Conv2d(out_channel, out_channel,\n", " kernel_size=3, stride=stride,\n", " weight_init=weight_init)\n", " self.norm2 = nn.BatchNorm2d(out_channel)\n", " self.conv3 = nn.Conv2d(out_channel, out_channel * self.expansion,\n", " kernel_size=1, weight_init=weight_init)\n", " self.norm3 = nn.BatchNorm2d(out_channel * self.expansion)\n", "\n", " self.relu = nn.ReLU()\n", " self.down_sample = down_sample\n", "\n", " def construct(self, x):\n", "\n", " identity = x # shortscuts分支\n", "\n", " out = self.conv1(x) # 主分支第一层:1*1卷积层\n", " out = self.norm1(out)\n", " out = self.relu(out)\n", " out = self.conv2(out) # 主分支第二层:3*3卷积层\n", " out = self.norm2(out)\n", " out = self.relu(out)\n", " out = self.conv3(out) # 主分支第三层:1*1卷积层\n", " out = self.norm3(out)\n", "\n", " if self.down_sample is not None:\n", " identity = self.down_sample(x)\n", "\n", " out += identity # 输出为主分支与shortcuts之和\n", " out = self.relu(out)\n", "\n", " return out" ] }, { "cell_type": "markdown", "id": "d1d8dfc9", "metadata": {}, "source": [ "#### 构建ResNet50网络\n", "\n", "ResNet网络层结构如下图所示,以输入彩色图像$224\\times224$为例,首先通过数量64,卷积核大小为$7\\times7$,stride为2的卷积层conv1,该层输出图片大小为$112\\times112$,输出channel为64;然后通过一个$3\\times3$的最大下采样池化层,该层输出图片大小为$56\\times56$,输出channel为64;再堆叠4个残差网络块(conv2_x、conv3_x、conv4_x和conv5_x),此时输出图片大小为$7\\times7$,输出channel为2048;最后通过一个平均池化层、全连接层和softmax,得到分类概率。\n", "\n", "![resnet-layer](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/tutorials/application/source_zh_cn/cv/images/resnet_2.png)\n", "\n", "对于每个残差网络块,以ResNet50网络中的conv2_x为例,其由3个Bottleneck结构堆叠而成,每个Bottleneck输入的channel为64,输出channel为256。\n", "\n", "如下示例定义`make_layer`实现残差块的构建,其参数如下所示:\n", "\n", "+ `last_out_channel`:上一个残差网络输出的通道数。\n", "+ `block`:残差网络的类别,分别为`ResidualBlockBase`和`ResidualBlock`。\n", "+ `channel`:残差网络输入的通道数。\n", "+ `block_nums`:残差网络块堆叠的个数。\n", "+ `stride`:卷积移动的步幅。" ] }, { "cell_type": "code", "execution_count": 7, "id": "3dfa40a1", "metadata": {}, "outputs": [], "source": [ "def make_layer(last_out_channel, block: Type[Union[ResidualBlockBase, ResidualBlock]],\n", " channel: int, block_nums: int, stride: int = 1):\n", " down_sample = None # shortcuts分支\n", "\n", "\n", " if stride != 1 or last_out_channel != channel * block.expansion:\n", "\n", " down_sample = nn.SequentialCell([\n", " nn.Conv2d(last_out_channel, channel * block.expansion,\n", " kernel_size=1, stride=stride, weight_init=weight_init),\n", " nn.BatchNorm2d(channel * block.expansion, gamma_init=gamma_init)\n", " ])\n", "\n", " layers = []\n", " layers.append(block(last_out_channel, channel, stride=stride, down_sample=down_sample))\n", "\n", " in_channel = channel * block.expansion\n", " # 堆叠残差网络\n", " for _ in range(1, block_nums):\n", "\n", " layers.append(block(in_channel, channel))\n", "\n", " return nn.SequentialCell(layers)" ] }, { "cell_type": "markdown", "id": "67dae353", "metadata": {}, "source": [ "ResNet50网络共有5个卷积结构,一个平均池化层,一个全连接层,以CIFAR-10数据集为例:\n", "\n", "+ **conv1**:输入图片大小为$32\\times32$,输入channel为3。首先经过一个卷积核数量为64,卷积核大小为$7\\times7$,stride为2的卷积层;然后通过一个Batch Normalization层;最后通过Reul激活函数。该层输出feature map大小为$16\\times16$,输出channel为64。\n", "+ **conv2_x**:输入feature map大小为$16\\times16$,输入channel为64。首先经过一个卷积核大小为$3\\times3$,stride为2的最大下采样池化操作;然后堆叠3个$[1\\times1,64;3\\times3,64;1\\times1,256]$结构的Bottleneck。该层输出feature map大小为$8\\times8$,输出channel为256。\n", "+ **conv3_x**:输入feature map大小为$8\\times8$,输入channel为256。该层堆叠4个[1×1,128;3×3,128;1×1,512]结构的Bottleneck。该层输出feature map大小为$4\\times4$,输出channel为512。\n", "+ **conv4_x**:输入feature map大小为$4\\times4$,输入channel为512。该层堆叠6个[1×1,256;3×3,256;1×1,1024]结构的Bottleneck。该层输出feature map大小为$2\\times2$,输出channel为1024。\n", "+ **conv5_x**:输入feature map大小为$2\\times2$,输入channel为1024。该层堆叠3个[1×1,512;3×3,512;1×1,2048]结构的Bottleneck。该层输出feature map大小为$1\\times1$,输出channel为2048。\n", "+ **average pool & fc**:输入channel为2048,输出channel为分类的类别数。\n", "\n", "如下示例代码实现ResNet50模型的构建,通过用调函数`resnet50`即可构建ResNet50模型,函数`resnet50`参数如下:\n", "\n", "+ `num_classes`:分类的类别数,默认类别数为1000。\n", "+ `pretrained`:下载对应的训练模型,并加载预训练模型中的参数到网络中。" ] }, { "cell_type": "code", "execution_count": 8, "id": "1ebef3d0", "metadata": {}, "outputs": [], "source": [ "from mindspore import load_checkpoint, load_param_into_net\n", "\n", "\n", "class ResNet(nn.Cell):\n", " def __init__(self, block: Type[Union[ResidualBlockBase, ResidualBlock]],\n", " layer_nums: List[int], num_classes: int, input_channel: int) -> None:\n", " super(ResNet, self).__init__()\n", "\n", " self.relu = nn.ReLU()\n", " # 第一个卷积层,输入channel为3(彩色图像),输出channel为64\n", " self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, weight_init=weight_init)\n", " self.norm = nn.BatchNorm2d(64)\n", " # 最大池化层,缩小图片的尺寸\n", " self.max_pool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same')\n", " # 各个残差网络结构块定义\n", " self.layer1 = make_layer(64, block, 64, layer_nums[0])\n", " self.layer2 = make_layer(64 * block.expansion, block, 128, layer_nums[1], stride=2)\n", " self.layer3 = make_layer(128 * block.expansion, block, 256, layer_nums[2], stride=2)\n", " self.layer4 = make_layer(256 * block.expansion, block, 512, layer_nums[3], stride=2)\n", " # 平均池化层\n", " self.avg_pool = nn.AvgPool2d()\n", " # flattern层\n", " self.flatten = nn.Flatten()\n", " # 全连接层\n", " self.fc = nn.Dense(in_channels=input_channel, out_channels=num_classes)\n", "\n", " def construct(self, x):\n", "\n", " x = self.conv1(x)\n", " x = self.norm(x)\n", " x = self.relu(x)\n", " x = self.max_pool(x)\n", "\n", " x = self.layer1(x)\n", " x = self.layer2(x)\n", " x = self.layer3(x)\n", " x = self.layer4(x)\n", "\n", " x = self.avg_pool(x)\n", " x = self.flatten(x)\n", " x = self.fc(x)\n", "\n", " return x" ] }, { "cell_type": "code", "execution_count": 9, "id": "d16e658e", "metadata": {}, "outputs": [], "source": [ "def _resnet(model_url: str, block: Type[Union[ResidualBlockBase, ResidualBlock]],\n", " layers: List[int], num_classes: int, pretrained: bool, pretrained_ckpt: str,\n", " input_channel: int):\n", " model = ResNet(block, layers, num_classes, input_channel)\n", "\n", " if pretrained:\n", " # 加载预训练模型\n", " download(url=model_url, path=pretrained_ckpt)\n", " param_dict = load_checkpoint(pretrained_ckpt)\n", " load_param_into_net(model, param_dict)\n", "\n", " return model\n", "\n", "\n", "def resnet50(num_classes: int = 1000, pretrained: bool = False):\n", " \"ResNet50模型\"\n", " resnet50_url = \"https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/models/application/resnet50_224_new.ckpt\"\n", " resnet50_ckpt = \"./LoadPretrainedModel/resnet50_224_new.ckpt\"\n", " return _resnet(resnet50_url, ResidualBlock, [3, 4, 6, 3], num_classes,\n", " pretrained, resnet50_ckpt, 2048)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d40bd05a", "metadata": {}, "source": [ "## 模型训练与评估\n", "\n", "本节使用[ResNet50预训练模型](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/models/application/resnet50_224_new.ckpt)进行微调。调用`resnet50`构造ResNet50模型,并设置`pretrained`参数为True,将会自动下载ResNet50预训练模型,并加载预训练模型中的参数到网络中。然后定义优化器和损失函数,逐个epoch打印训练的损失值和评估精度,并保存评估精度最高的ckpt文件(resnet50-best.ckpt)到当前路径的./BestCheckPoint下。\n", "\n", ">此处我们展示了5个epochs的训练过程,如果想要达到理想的训练效果,建议训练80个epochs。" ] }, { "cell_type": "code", "execution_count": 11, "id": "9cf10c03", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Replace is False and data exists, so doing nothing. Use replace=True to re-download the data.\n" ] } ], "source": [ "import mindspore as ms\n", "# 定义ResNet50网络\n", "network = resnet50(pretrained=True)\n", "\n", "# 全连接层输入层的大小\n", "in_channel = network.fc.in_channels\n", "fc = nn.Dense(in_channels=in_channel, out_channels=10)\n", "# 重置全连接层\n", "network.fc = fc\n", "\n", "for param in network.get_parameters():\n", " param.requires_grad = True" ] }, { "cell_type": "code", "execution_count": 14, "id": "e1c632ff", "metadata": {}, "outputs": [], "source": [ "# 设置学习率\n", "num_epochs = 5\n", "lr = nn.cosine_decay_lr(min_lr=0.00001, max_lr=0.001, total_step=step_size_train * num_epochs,\n", " step_per_epoch=step_size_train, decay_epoch=num_epochs)\n", "# 定义优化器和损失函数\n", "opt = nn.Momentum(params=network.trainable_params(), learning_rate=lr, momentum=0.9)\n", "loss_fn = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')\n", "\n", "\n", "def forward_fn(inputs, targets):\n", " logits = network(inputs)\n", " loss = loss_fn(logits, targets)\n", "\n", " return loss\n", "\n", "grad_fn = ms.value_and_grad(forward_fn, None, opt.parameters)\n", "\n", "def train_step(inputs, targets):\n", " loss, grads = grad_fn(inputs, targets)\n", " opt(grads)\n", " return loss\n", "\n", "# 实例化模型\n", "model = ms.Model(network, loss_fn, opt, metrics={\"Accuracy\": train.Accuracy()})" ] }, { "cell_type": "code", "execution_count": 15, "id": "b627e30c", "metadata": {}, "outputs": [], "source": [ "# 创建迭代器\n", "data_loader_train = dataset_train.create_tuple_iterator(num_epochs=num_epochs)\n", "data_loader_val = dataset_val.create_tuple_iterator(num_epochs=num_epochs)\n", "\n", "# 最佳模型存储路径\n", "best_acc = 0\n", "best_ckpt_dir = \"./BestCheckpoint\"\n", "best_ckpt_path = \"./BestCheckpoint/resnet50-best.ckpt\"" ] }, { "cell_type": "code", "execution_count": null, "id": "562a04ca", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "# 开始循环训练\n", "print(\"Start Training Loop ...\")\n", "\n", "for epoch in range(num_epochs):\n", " losses = []\n", " network.set_train()\n", "\n", " # 为每轮训练读入数据\n", "\n", " for i, (images, labels) in enumerate(data_loader_train):\n", " loss = train_step(images, labels)\n", " if i%100 == 0 or i == step_size_train -1:\n", " print('Epoch: [%3d/%3d], Steps: [%3d/%3d], Train Loss: [%5.3f]'%(\n", " epoch+1, num_epochs, i+1, step_size_train, loss))\n", " losses.append(loss)\n", "\n", " # 每个epoch结束后,验证准确率\n", "\n", " acc = model.eval(dataset_val)['Accuracy']\n", "\n", " print(\"-\" * 50)\n", " print(\"Epoch: [%3d/%3d], Average Train Loss: [%5.3f], Accuracy: [%5.3f]\" % (\n", " epoch+1, num_epochs, sum(losses)/len(losses), acc\n", " ))\n", " print(\"-\" * 50)\n", "\n", " if acc > best_acc:\n", " best_acc = acc\n", " if not os.path.exists(best_ckpt_dir):\n", " os.mkdir(best_ckpt_dir)\n", " ms.save_checkpoint(network, best_ckpt_path)\n", "\n", "print(\"=\" * 80)\n", "print(f\"End of validation the best Accuracy is: {best_acc: 5.3f}, \"\n", " f\"save the best ckpt file in {best_ckpt_path}\", flush=True)" ] }, { "cell_type": "markdown", "id": "6ca392ab", "metadata": {}, "source": [ "```Text\n", "Epoch: [ 1/ 5], Steps: [ 1/8334], Train Loss: [2.438]\n", "Epoch: [ 1/ 5], Steps: [101/8334], Train Loss: [2.371]\n", "\n", "......\n", "\n", "Epoch: [ 1/ 5], Steps: [8334/8334], Train Loss: [2.292]\n", "--------------------------------------------------\n", "Epoch: [ 1/ 5], Average Train Loss: [2.007], Accuracy: [0.240]\n", "--------------------------------------------------\n", "\n", "......\n", "\n", "\n", "Epoch: [ 5/ 5], Steps: [8334/8334], Train Loss: [3.519]\n", "--------------------------------------------------\n", "Epoch: [ 5/ 5], Average Train Loss: [1.621], Accuracy: [0.498]\n", "--------------------------------------------------\n", "================================================================================\n", "End of validation the best Accuracy is: 0.498, save the best ckpt file in ./BestCheckpoint/resnet50-best.ckpt\n", "\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "id": "46e28f6f", "metadata": {}, "source": [ "## 可视化模型预测\n", "\n", "定义`visualize_model`函数,使用上述验证精度最高的模型对CIFAR-10测试数据集进行预测,并将预测结果可视化。若预测字体颜色为蓝色表示为预测正确,预测字体颜色为红色则表示预测错误。\n", "\n", "> 由上面的结果可知,5个epochs下模型在验证数据集的预测准确率不到50%,即仅可以正确预测不到一半数量的图片分类,实际预测的准确率可能会更低。下图我们展示了训练40个epochs后较好的预测结果,但此结果有随机性,一般情况下6张图片中会有1-2张预测失败。如果想要达到理想的训练效果,建议训练80个epochs。" ] }, { "cell_type": "code", "execution_count": 9, "id": "6ba2fa94", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "\n", "def visualize_model(best_ckpt_path, dataset_val):\n", " num_class = 10 # 对狼和狗图像进行二分类\n", " net = resnet50(num_class)\n", " # 加载模型参数\n", " param_dict = ms.load_checkpoint(best_ckpt_path)\n", " ms.load_param_into_net(net, param_dict)\n", " model = ms.Model(net)\n", " # 加载验证集的数据进行验证\n", " data = next(dataset_val.create_dict_iterator())\n", " images = data[\"image\"].asnumpy()\n", " labels = data[\"label\"].asnumpy()\n", " # 预测图像类别\n", " output = model.predict(ms.Tensor(data['image']))\n", " pred = np.argmax(output.asnumpy(), axis=1)\n", "\n", " # 图像分类\n", " classes = []\n", "\n", " with open(data_dir+\"/batches.meta.txt\", \"r\") as f:\n", " for line in f:\n", " line = line.rstrip()\n", " if line != '':\n", " classes.append(line)\n", "\n", " # 显示图像及图像的预测值\n", " plt.figure()\n", " for i in range(6):\n", " plt.subplot(2, 3, i+1)\n", " # 若预测正确,显示为蓝色;若预测错误,显示为红色\n", " color = 'blue' if pred[i] == labels[i] else 'red'\n", " plt.title('predict:{}'.format(classes[pred[i]]), color=color)\n", " picture_show = np.transpose(images[i], (1, 2, 0))\n", " mean = np.array([0.4914, 0.4822, 0.4465])\n", " std = np.array([0.2023, 0.1994, 0.2010])\n", " picture_show = std * picture_show + mean\n", " picture_show = np.clip(picture_show, 0, 1)\n", " plt.imshow(picture_show)\n", " plt.axis('off')\n", "\n", " plt.show()\n", "\n", "# 使用测试数据集进行验证\n", "visualize_model(best_ckpt_path=best_ckpt_path, dataset_val=dataset_val)" ] } ], "metadata": { "kernelspec": { "display_name": "MindSpore", "language": "python", "name": "mindspore" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "vscode": { "interpreter": { "hash": "61b352d89025746abfd3d4fa7053c22c36b9d81e9898372aef9407193f0acc45" } } }, "nbformat": 4, "nbformat_minor": 5 }