{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 数据加载及处理\n", "\n", "`Ascend` `GPU` `CPU` `入门` `数据准备`\n", "\n", "[![](https://gitee.com/mindspore/docs/raw/r1.5/resource/_static/logo_modelarts.png)](https://authoring-modelarts-cnnorth4.huaweicloud.com/console/lab?share-url-b64=aHR0cHM6Ly9taW5kc3BvcmUtd2Vic2l0ZS5vYnMuY24tbm9ydGgtNC5teWh1YXdlaWNsb3VkLmNvbS9ub3RlYm9vay9tb2RlbGFydHMvcXVpY2tfc3RhcnQvbWluZHNwb3JlX2RhdGFzZXQuaXB5bmI=&imageid=65f636a0-56cf-49df-b941-7d2a07ba8c8c) [![](https://gitee.com/mindspore/docs/raw/r1.5/resource/_static/logo_notebook.png)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/r1.5/tutorials/zh_cn/mindspore_dataset.ipynb) [![](https://gitee.com/mindspore/docs/raw/r1.5/resource/_static/logo_download_code.png)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/r1.5/tutorials/zh_cn/mindspore_dataset.py) [![](https://gitee.com/mindspore/docs/raw/r1.5/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/r1.5/tutorials/source_zh_cn/dataset.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "MindSpore提供了部分常用数据集和标准格式数据集的加载接口,用户可以直接使用`mindspore.dataset`中对应的数据集加载类进行数据加载。数据集类为用户提供了常用的数据处理接口,使得用户能够快速进行数据处理操作。\n", "\n", "## 数据准备\n", "\n", "在NoteBook中执行以下命令,下载并解压数据集到指定位置。" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "!mkdir ./datasets\n", "!wget -N https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/cifar-10-binary.tar.gz --no-check-certificate\n", "!wget -N https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/MNIST_Data.zip --no-check-certificate\n", "!unzip -d ./datasets -o MNIST_Data.zip\n", "!tar -zxvf cifar-10-binary.tar.gz -C ./datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 加载数据集\n", "\n", "下面的样例通过`Cifar10Dataset`接口加载CIFAR-10数据集,使用顺序采样器获取前5个样本。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import mindspore.dataset as ds\n", "\n", "DATA_DIR = \"./datasets/cifar-10-batches-bin\"\n", "sampler = ds.SequentialSampler(num_samples=5)\n", "dataset = ds.Cifar10Dataset(DATA_DIR, sampler=sampler)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 迭代数据集\n", "\n", "用户可以用`create_dict_iterator`创建数据迭代器,迭代访问数据,下面展示了对应图片的形状和标签。" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Image shape: (32, 32, 3) , Label: 6\n", "Image shape: (32, 32, 3) , Label: 9\n", "Image shape: (32, 32, 3) , Label: 9\n", "Image shape: (32, 32, 3) , Label: 4\n", "Image shape: (32, 32, 3) , Label: 1\n" ] } ], "source": [ "for data in dataset.create_dict_iterator():\n", " print(\"Image shape: {}\".format(data['image'].shape), \", Label: {}\".format(data['label']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 自定义数据集\n", "\n", "对于目前MindSpore不支持直接加载的数据集,可以构造自定义数据集类,然后通过`GeneratorDataset`接口实现自定义方式的数据加载。\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "np.random.seed(58)\n", "\n", "class DatasetGenerator:\n", " def __init__(self):\n", " self.data = np.random.sample((5, 2))\n", " self.label = np.random.sample((5, 1))\n", "\n", " def __getitem__(self, index):\n", " return self.data[index], self.label[index]\n", "\n", " def __len__(self):\n", " return len(self.data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "其中用户需要自定义的类函数如下:\n", "\n", "- **\\_\\_init\\_\\_**\n", "\n", " 实例化数据集对象时,`__init__`函数被调用,用户可以在此进行数据初始化等操作。\n", "\n", " ```python\n", " def __init__(self):\n", " self.data = np.random.sample((5, 2))\n", " self.label = np.random.sample((5, 1))\n", " ```\n", "\n", "- **\\_\\_getitem\\_\\_**\n", "\n", " 定义数据集类的`__getitem__`函数,使其支持随机访问,能够根据给定的索引值`index`,获取数据集中的数据并返回。\n", "\n", " 其中`__getitem__`函数的返回值,需要是由numpy数组组成的元组(tuple),当返回单个numpy数组时可以写成 `return (np_array_1,)`。\n", "\n", " ```python\n", " def __getitem__(self, index):\n", " return self.data[index], self.label[index]\n", " ```\n", "\n", "- **\\_\\_len\\_\\_**\n", "\n", " 定义数据集类的`__len__`函数,返回数据集的样本数量。\n", "\n", " ```python\n", " def __len__(self):\n", " return len(self.data)\n", " ```\n", " \n", "定义数据集类之后,就可以通过`GeneratorDataset`接口按照用户定义的方式加载并访问数据集样本。" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.36510558 0.45120592] [0.78888122]\n", "[0.49606035 0.07562207] [0.38068183]\n", "[0.57176158 0.28963401] [0.16271622]\n", "[0.30880446 0.37487617] [0.54738768]\n", "[0.81585667 0.96883469] [0.77994068]\n" ] } ], "source": [ "dataset_generator = DatasetGenerator()\n", "dataset = ds.GeneratorDataset(dataset_generator, [\"data\", \"label\"], shuffle=False)\n", "\n", "for data in dataset.create_dict_iterator():\n", " print('{}'.format(data[\"data\"]), '{}'.format(data[\"label\"]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 数据处理及增强\n", "\n", "### 数据处理\n", "\n", "MindSpore提供的数据集接口具备常用的数据处理方法,用户只需调用相应的函数接口即可快速进行数据处理。\n", "\n", "下面的样例先将数据集随机打乱顺序,然后将样本两两组成一个批次。" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data: [[0.36510558 0.45120592]\n", " [0.57176158 0.28963401]]\n", "label: [[0.78888122]\n", " [0.16271622]]\n", "data: [[0.30880446 0.37487617]\n", " [0.49606035 0.07562207]]\n", "label: [[0.54738768]\n", " [0.38068183]]\n", "data: [[0.81585667 0.96883469]]\n", "label: [[0.77994068]]\n" ] } ], "source": [ "ds.config.set_seed(58)\n", "\n", "# 随机打乱数据顺序\n", "dataset = dataset.shuffle(buffer_size=10)\n", "# 对数据集进行分批\n", "dataset = dataset.batch(batch_size=2)\n", "\n", "for data in dataset.create_dict_iterator():\n", " print(\"data: {}\".format(data[\"data\"]))\n", " print(\"label: {}\".format(data[\"label\"]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "其中,\n", "\n", "`buffer_size`:数据集中进行shuffle操作的缓存区的大小。\n", "\n", "`batch_size`:每组包含的数据个数,现设置每组包含2个数据。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 数据增强\n", "\n", "数据量过小或是样本场景单一等问题会影响模型的训练效果,用户可以通过数据增强操作扩充样本多样性,从而提升模型的泛化能力。\n", "\n", "下面的样例使用`mindspore.dataset.vision.c_transforms`模块中的算子对MNIST数据集进行数据增强。\n", "\n", "导入`c_transforms`模块,加载MNIST数据集。" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAARYAAAExCAYAAAC55I3BAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAU3klEQVR4nO3dbWhU6fnH8d/oJlMfMpONmoyDUePabthKLYjarIvdYupDwTa7eVG2feFCcVHHLbr0AV+oXSikdaHQbQULS5VC1SI0igsVNGqkkFhqFbEuwahds5vMuEpzRqNOUnP/X2x3+p815sFck3Mmfj9wgXPOnZkr95gfJ+c+cxJyzjkBgKEJfjcAYPwhWACYI1gAmCNYAJgjWACYI1gAmCNYAJgjWACYI1gAmCNYAJgjWGAmFAoNq15++WW/W0WeESwAzD3jdwMYfzZu3KhNmzY9dv+UKVPGsBv4gWCBufLyci1YsMDvNuAjfhUCYI5gAWCOYAFgLsQd5GAlFApJkl544QU55/Svf/1LEydOVCwW04svvqjXX39d3/jGN3zuEmOBYIGZz4JlMHV1ddq3b5+i0egYdAS/ECwwM2XKFH3729/WihUrVF1dralTp+qTTz5Rc3Oz9uzZo9u3b0uSvv71r+v48eMqKiryuWPkC8ECM93d3SotLR1wXyqV0po1a3T+/HlJ0q9//Wv98Ic/HMPuMJYIFoyZa9euqbq6Wn19fZo/f76uXLnid0vIE1aFMGbmzZunb37zm5Kk9vZ2dXZ2+twR8oVgwZh64YUXsv/++OOPfewE+USwYEwNZ+UIhY9gwZi6fPly9t/xeNzHTpBPnLzFmLl+/bqqq6vV29ur5557Tu3t7X63hDzhiAUmjh49qv/85z+P3Z9KpVRfX6/e3l5JGvS2Cih8HLHAxNy5c9XX16f6+nrV1NRo7ty5mjRpkm7duqXTp0/rd7/7nW7duiVJeumll3TixAmFw2Gfu0a+ECwwMXfuXH344YdDjquvr9d777332AvpMD4QLDDR3Nys5uZmtbS06Nq1a7p165bS6bSmTp2qyspKvfjii1q3bp1qamr8bhVjgGABYI6TtwDMESwAzBEsAMwRLADMESwAzBEsAMwF7g+W9ff3q7OzUyUlJXwSFggQ55zu3LmjeDyuCROGOCZxefLb3/7WzZkzx4XDYbdkyRJ39uzZYX1dR0eHk0RRVECro6NjyJ/jvATLwYMHXXFxsfv973/v/vnPf7r169e70tJSl0qlhvza7u5u3yeOoqjHV3d395A/x3kJliVLlrhEIpF9/PDhQxePx11DQ8OQX+t5nu8TR1HU48vzvCF/js1P3vb29urcuXOqra3NbpswYYJqa2vV0tLyyPhMJqN0Op1TAAqbebDcunVLDx8+VEVFRc72iooKJZPJR8Y3NDQoGo1mq7Ky0rolAGPM9+Xmbdu2yfO8bHV0dPjdEoBRMl9unj59uiZOnKhUKpWzPZVKKRaLPTI+HA5zwx9gnDE/YikuLtaiRYvU1NSU3dbf36+mpibuxQE8LUa1/PMYBw8edOFw2O3bt89dvnzZvfHGG660tNQlk8khv5ZVIYoKdg1nVSgvV95+97vf1SeffKIdO3YomUzqq1/9qo4dO/bICV0A41Pg7iCXTqcVjUb9bgPAY3iep0gkMugY31eFAIw/BAsAcwQLAHMECwBzBAsAcwQLAHMECwBzBAsAcwQLAHMECwBzBAsAcwQLAHMECwBzBAsAcwQLAHMECwBzBAsAcwQLAHMECwBzBAsAcwQLAHMECwBzBAsAcwQLAHMECwBzBAsAcwQLAHMECwBzBAsAcwQLAHPmwfKzn/1MoVAop6qrq61fBmPEOUcFpArJM/l40i9/+cs6ceLE/17kmby8DICAystP/DPPPKNYLJaPpwZQAPJyjuXKlSuKx+OaN2+evv/97+vGjRv5eBkAARVyxr+8/eUvf9Hdu3f1/PPPq6urS2+//bY+/vhjXbp0SSUlJY+Mz2QyymQy2cfpdFqVlZWWLWEUCu13+/EsFAr53YIkyfM8RSKRwQe5PPv3v//tIpGIe++99wbcv3PnTieJCmghOPz+v/BZeZ43ZK95X24uLS3Vl770JbW3tw+4f9u2bfI8L1sdHR35bglAnuU9WO7evaurV69q5syZA+4Ph8OKRCI5BaCwma8K/ehHP9LatWs1Z84cdXZ2aufOnZo4caJee+0165ca9xznN1CgzIPlo48+0muvvabbt29rxowZeumll9Ta2qoZM2ZYvxSAgDJfFRqtdDqtaDTqdxuBELC3Bj4rpFUhPisEwBzBAsAcwQLAHMECwBzBAsAcwQLAHDdK8RHLyfj/grKcbIEjFgDmCBYA5ggWAOYIFgDmCBYA5ggWAOYIFgDmuI4F4954uj6kUHDEAsAcwQLAHMECwBzBAsAcwQLAHMECwBzBAsAc17H4aKjrK56W+7UwD+MPRywAzBEsAMwRLADMESwAzBEsAMwRLADMESwAzI04WM6cOaO1a9cqHo8rFArp8OHDOfudc9qxY4dmzpypSZMmqba2VleuXLHq96kSCoUGrUIw1PcwnO9jOM9R6PM03ow4WHp6erRw4ULt3r17wP27du3Su+++qz179ujs2bOaMmWKVq1apQcPHoy6WQAFwo2CJNfY2Jh93N/f72KxmHvnnXey27q7u104HHYHDhwY1nN6nuckUcOoQuD3HFH25XnekO+76TmW69evK5lMqra2NrstGo1q6dKlamlpsXwpAAFm+lmhZDIpSaqoqMjZXlFRkd33eZlMRplMJvs4nU5btgTAB76vCjU0NCgajWarsrLS75YAjJJpsMRiMUlSKpXK2Z5KpbL7Pm/btm3yPC9bHR0dli0B8IFpsFRVVSkWi6mpqSm7LZ1O6+zZs6qpqRnwa8LhsCKRSE4BKGwjPsdy9+5dtbe3Zx9fv35dFy5cUFlZmWbPnq0tW7bo5z//ub74xS+qqqpK27dvVzweV11dnWXfAIJspMuHp06dGnAJat26dc65T5ect2/f7ioqKlw4HHYrVqxwbW1tw35+lpvHtoLA7zmgRlbDWW4O/feNDYx0Oq1oNOp3G0+NILz9XB1bWDzPG/KUhe+rQgDGH4IFgDmCBYA5ggWAOYIFgDmCBYA5ggWAOYIFgDmCBYA5ggWAOYIFgDmCBYA5ggWAOYIFgDnTm2mj8Ax1y4KxuK1Cvl+D2zKMPY5YAJgjWACYI1gAmCNYAJgjWACYI1gAmCNYAJjjOhYMKgjXuYzWcHrkWhdbHLEAMEewADBHsAAwR7AAMEewADBHsAAwR7AAMEewADA34mA5c+aM1q5dq3g8rlAopMOHD+fsf/311xUKhXJq9erVVv0iYD7/Xn++CoVzbtDCyIw4WHp6erRw4ULt3r37sWNWr16trq6ubB04cGBUTQIoLCO+pH/NmjVas2bNoGPC4bBisdgTNwWgsOXlHMvp06dVXl6u559/Xhs3btTt27cfOzaTySidTucUgMJmHiyrV6/WH/7wBzU1NemXv/ylmpubtWbNGj18+HDA8Q0NDYpGo9mqrKy0bgnAGAu5UZyZCoVCamxsVF1d3WPHXLt2Tc8995xOnDihFStWPLI/k8kok8lkH6fTacJlHBkvJz4L6UR0vnmep0gkMuiYvC83z5s3T9OnT1d7e/uA+8PhsCKRSE4BKGx5D5aPPvpIt2/f1syZM/P9UgACYsSrQnfv3s05+rh+/bouXLigsrIylZWV6e2331Z9fb1isZiuXr2qn/zkJ5o/f75WrVpl2jgKw3B+hRgvvy7h/3EjdOrUKSfpkVq3bp27d++eW7lypZsxY4YrKipyc+bMcevXr3fJZHLYz+953oDPT43fKgR+z1GQyvO8IedrVCdv8yGdTisajfrdBsZQwP4LDoiTt/8TiJO3AJ4+BAsAcwQLAHMECwBzBAsAc/zBMvhutCsuY7GqNNRrsGqUiyMWAOYIFgDmCBYA5ggWAOYIFgDmCBYA5ggWAOa4jgV5VQifXIY9jlgAmCNYAJgjWACYI1gAmCNYAJgjWACYI1gAmCNYAJjjAjkMigvcPsWNnEaGIxYA5ggWAOYIFgDmCBYA5ggWAOYIFgDmCBYA5kYULA0NDVq8eLFKSkpUXl6uuro6tbW15Yx58OCBEomEpk2bpqlTp6q+vl6pVMq0aQyPc27U9bQIhUKDFkZmRMHS3NysRCKh1tZWHT9+XH19fVq5cqV6enqyY7Zu3aqjR4/q0KFDam5uVmdnp1599VXzxgEEmBuFmzdvOkmuubnZOedcd3e3KyoqcocOHcqO+eCDD5wk19LSMqzn9DzPSaIMCsPn93tVSOV53pDzOapzLJ7nSZLKysokSefOnVNfX59qa2uzY6qrqzV79my1tLQM+ByZTEbpdDqnABS2Jw6W/v5+bdmyRcuWLdOCBQskSclkUsXFxSotLc0ZW1FRoWQyOeDzNDQ0KBqNZquysvJJWwIQEE8cLIlEQpcuXdLBgwdH1cC2bdvkeV62Ojo6RvV8APz3RJ9u3rx5s95//32dOXNGs2bNym6PxWLq7e1Vd3d3zlFLKpVSLBYb8LnC4bDC4fCTtAEgoEZ0xOKc0+bNm9XY2KiTJ0+qqqoqZ/+iRYtUVFSkpqam7La2tjbduHFDNTU1Nh0DCLwRHbEkEgnt379fR44cUUlJSfa8STQa1aRJkxSNRvWDH/xAb731lsrKyhSJRPTmm2+qpqZGX/va1/LyDYxn7im6jiSfuA7FBxZLcnv37s2OuX//vtu0aZN79tln3eTJk90rr7ziurq6hv0aLDf/r2DD7/dxvNVwlptD/534wEin04pGo363EQgBe2sKFkcstjzPUyQSGXQMnxUCYI5gAWCOYAFgjmABYI5gAWCOvyuUJ6zo2GFVp/BwxALAHMECwBzBAsAcwQLAHMECwBzBAsAcwQLAHMECwBwXyD0GF7jZ4OK2pxNHLADMESwAzBEsAMwRLADMESwAzBEsAMwRLADMcR0LBsV1KHgSHLEAMEewADBHsAAwR7AAMEewADBHsAAwR7AAMDeiYGloaNDixYtVUlKi8vJy1dXVqa2tLWfMyy+/rFAolFMbNmwwbXosfP57eFoLeBIjCpbm5mYlEgm1trbq+PHj6uvr08qVK9XT05Mzbv369erq6srWrl27TJsGEGwjuvL22LFjOY/37dun8vJynTt3TsuXL89unzx5smKxmE2HAArOqM6xeJ4nSSorK8vZ/sc//lHTp0/XggULtG3bNt27d280LwOgwDzxZ4X6+/u1ZcsWLVu2TAsWLMhu/973vqc5c+YoHo/r4sWL+ulPf6q2tjb9+c9/HvB5MpmMMplM9nE6nX7SlgAEhXtCGzZscHPmzHEdHR2DjmtqanKSXHt7+4D7d+7c6SRRFFUg5XnekPnwRMGSSCTcrFmz3LVr14Yce/fuXSfJHTt2bMD9Dx48cJ7nZaujo8P3iaMo6vE1nGAZ0a9Czjm9+eabamxs1OnTp1VVVTXk11y4cEGSNHPmzAH3h8NhhcPhkbQBIOBGFCyJREL79+/XkSNHVFJSomQyKUmKRqOaNGmSrl69qv379+tb3/qWpk2bposXL2rr1q1avny5vvKVr+TlGwAQQCP5FUiPOTTau3evc865GzduuOXLl7uysjIXDofd/Pnz3Y9//ONhHTp9xvM83w/1KIp6fA3n5zn038AIjHQ6rWg06ncbAB7D8zxFIpFBx/BZIQDmCBYA5ggWAOYIFgDmCBYA5ggWAOYIFgDmCBYA5ggWAOYIFgDmCBYA5ggWAOYIFgDmAhcsAfuwNYDPGc7PaOCC5c6dO363AGAQw/kZDdz9WPr7+9XZ2amSkhKFQiGl02lVVlaqo6NjyHtAYHDMpY2ndR6dc7pz547i8bgmTBj8mOSJ//xHvkyYMEGzZs16ZHskEnmq3sR8Yi5tPI3zONybsAXuVyEAhY9gAWAu8MESDoe1c+dO/kSIAebSBvM4tMCdvAVQ+AJ/xAKg8BAsAMwRLADMESwAzAU+WHbv3q25c+fqC1/4gpYuXaq//e1vfrcUeGfOnNHatWsVj8cVCoV0+PDhnP3OOe3YsUMzZ87UpEmTVFtbqytXrvjTbIA1NDRo8eLFKikpUXl5uerq6tTW1pYz5sGDB0okEpo2bZqmTp2q+vp6pVIpnzoOjkAHy5/+9Ce99dZb2rlzp/7xj39o4cKFWrVqlW7evOl3a4HW09OjhQsXavfu3QPu37Vrl959913t2bNHZ8+e1ZQpU7Rq1So9ePBgjDsNtubmZiUSCbW2tur48ePq6+vTypUr1dPTkx2zdetWHT16VIcOHVJzc7M6Ozv16quv+th1QIzkj8KPtSVLlrhEIpF9/PDhQxePx11DQ4OPXRUWSa6xsTH7uL+/38ViMffOO+9kt3V3d7twOOwOHDjgQ4eF4+bNm06Sa25uds59Om9FRUXu0KFD2TEffPCBk+RaWlr8ajMQAnvE0tvbq3Pnzqm2tja7bcKECaqtrVVLS4uPnRW269evK5lM5sxrNBrV0qVLmdcheJ4nSSorK5MknTt3Tn19fTlzWV1drdmzZz/1cxnYYLl165YePnyoioqKnO0VFRVKJpM+dVX4Pps75nVk+vv7tWXLFi1btkwLFiyQ9OlcFhcXq7S0NGcscxnATzcDQZRIJHTp0iX99a9/9buVghDYI5bp06dr4sSJj5xhT6VSisViPnVV+D6bO+Z1+DZv3qz3339fp06dyrmlRywWU29vr7q7u3PGM5cBDpbi4mItWrRITU1N2W39/f1qampSTU2Nj50VtqqqKsVisZx5TafTOnv2LPP6Oc45bd68WY2NjTp58qSqqqpy9i9atEhFRUU5c9nW1qYbN24wl36fPR7MwYMHXTgcdvv27XOXL192b7zxhistLXXJZNLv1gLtzp077vz58+78+fNOkvvVr37lzp8/7z788EPnnHO/+MUvXGlpqTty5Ii7ePGi+853vuOqqqrc/fv3fe48WDZu3Oii0ag7ffq06+rqyta9e/eyYzZs2OBmz57tTp486f7+97+7mpoaV1NT42PXwRDoYHHOud/85jdu9uzZrri42C1ZssS1trb63VLgnTp1ykl6pNatW+ec+3TJefv27a6iosKFw2G3YsUK19bW5m/TATTQHEpye/fuzY65f/++27Rpk3v22Wfd5MmT3SuvvOK6urr8azoguG0CAHOBPccCoHARLADMESwAzBEsAMwRLADMESwAzBEsAMwRLADMESwAzBEsAMwRLADMESwAzP0f+94UhCU19OkAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "from mindspore.dataset.vision import Inter\n", "import mindspore.dataset.vision.c_transforms as c_vision\n", "\n", "DATA_DIR = './datasets/MNIST_Data/train'\n", "\n", "mnist_dataset = ds.MnistDataset(DATA_DIR, num_samples=6, shuffle=False)\n", "\n", "# 查看数据原图\n", "mnist_it = mnist_dataset.create_dict_iterator()\n", "data = next(mnist_it)\n", "plt.figure(figsize=(3,3))\n", "plt.imshow(data['image'].asnumpy().squeeze(), cmap=plt.cm.gray)\n", "plt.title(data['label'].asnumpy(), fontsize=20)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "定义数据增强算子,对数据集进行`Resize`和`RandomCrop`操作,然后通过`map`映射将其插入数据处理管道。\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "resize_op = c_vision.Resize(size=(200,200), interpolation=Inter.LINEAR)\n", "crop_op = c_vision.RandomCrop(150)\n", "transforms_list = [resize_op, crop_op]\n", "mnist_dataset = mnist_dataset.map(operations=transforms_list, input_columns=[\"image\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "查看数据增强效果。" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "mnist_dataset = mnist_dataset.create_dict_iterator()\n", "data = next(mnist_dataset)\n", "plt.figure(figsize=(3,3))\n", "plt.imshow(data['image'].asnumpy().squeeze(), cmap=plt.cm.gray)\n", "plt.title(data['label'].asnumpy(), fontsize=20)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "想要了解更多可以参考编程指南中[数据增强](https://www.mindspore.cn/docs/programming_guide/zh-CN/r1.5/augmentation.html)章节。" ] } ], "metadata": { "kernelspec": { "display_name": "MindSpore", "language": "python", "name": "mindspore" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }