mindvision.classification

mindvision.classification.dataset

Init dataset

class mindvision.classification.dataset.Cifar10(path: str, split: str = 'train', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, batch_size: int = 32, repeat_num: int = 1, shuffle: Optional[bool] = None, num_parallel_workers: int = 1, num_shards: Optional[int] = None, shard_id: Optional[int] = None, resize: Union[int, Tuple[int, int]] = 224, download: bool = False)[source]

A source dataset that downloads, reads, parses and augments the CIFAR-10 dataset.

The generated dataset has two columns [image, label]. The tensor of column image is a matrix of the float32 type. The tensor of column label is a scalar of the int32 type.

Parameters
  • path (str) – The root directory of the Cifar10 dataset or inference image.

  • split (str) – The dataset split supports “train”, “test” or “infer”. Default: “train”.

  • transform (callable, optional) – A function transform that takes in a image. Default: None.

  • target_transform (callable, optional) – A function transform that takes in a label. Default: None.

  • batch_size (int) – The batch size of dataset. Default: 32.

  • repeat_num (int) – The repeat num of dataset. Default: 1.

  • shuffle (bool, optional) – Whether or not to perform shuffle on the dataset. Default: None.

  • num_parallel_workers (int) – The number of subprocess used to fetch the dataset in parallel. Default: 1.

  • num_shards (int, optional) – The number of shards that the dataset will be divided. Default: None.

  • shard_id (int, optional) – The shard ID within num_shards. Default: None.

  • resize (Union[int, tuple]) – The output size of the resized image. If size is an integer, the smaller edge of the image will be resized to this value with the same image aspect ratio. If size is a sequence of length 2, it should be (height, width). Default: 224.

  • download (bool) – Whether to download the dataset. Default: False.

Raises

ValueError – If split is not ‘train’, ‘test’ or ‘infer’.

Examples

>>> from mindvision.classification.dataset import Cifar10
>>> dataset = Cifar10("./data/", "train")
>>> dataset = dataset.run()

About CIFAR-10 dataset:

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks.

Here is the original CIFAR-10 dataset structure. You can unzip the dataset files into the following directory structure and read them by MindSpore Vision’s API.

.
└── cifar-10-batches-py
     ├── data_batch_1
     ├── data_batch_2
     ├── data_batch_3
     ├── data_batch_4
     ├── data_batch_5
     ├── test_batch
     ├── readme.html
     └── batches.meta

Citation:

@techreport{Krizhevsky09,
author       = {Alex Krizhevsky},
title        = {Learning multiple layers of features from tiny images},
institution  = {},
year         = {2009},
howpublished = {http://www.cs.toronto.edu/~kriz/cifar.html}
}
default_transform()[source]

Set the default transform for Cifar10 dataset.

download_dataset()[source]

Download the Cifar10 data if it doesn’t exist.

property index2label

Get the mapping of indexes and labels.

class mindvision.classification.dataset.Cifar100(path: str, split: str = 'train', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, batch_size: int = 32, repeat_num: int = 1, shuffle: Optional[bool] = None, num_parallel_workers: int = 1, num_shards: Optional[int] = None, shard_id: Optional[int] = None, resize: Union[int, Tuple[int, int]] = 224, download: bool = False)[source]

A source dataset that downloads, reads, parses and augments the CIFAR-100 dataset.

The generated dataset has two columns [image, label]. The tensor of column image is a matrix of the float32 type. The tensor of column label is a scalar of the int32 type.

Parameters
  • path (str) – The root directory of the CIFAR-100 dataset or inference image.

  • split (str) – The dataset split, supports “train”, “test” or “infer”. Default: “train”.

  • transform (callable, optional) – A function transform that takes in a image. Default: None.

  • target_transform (callable, optional) – A function transform that takes in a label. Default: None.

  • batch_size (int) – The batch size of dataset. Default: 32.

  • repeat_num (int) – The repeat num of dataset. Default: 1.

  • shuffle (bool, optional) – Whether or not to perform shuffle on the dataset. Default: None.

  • num_parallel_workers (int) – The number of subprocess used to fetch the dataset in parallel. Default: 1.

  • num_shards (int, optional) – The number of shards that the dataset will be divided. Default: None.

  • shard_id (int, optional) – The shard ID within num_shards. Default: None.

  • resize (Union[int, tuple]) – The output size of the resized image. If size is an integer, the smaller edge of the image will be resized to this value with the same image aspect ratio. If size is a sequence of length 2, it should be (height, width). Default: 224.

  • download (bool) – Whether to download the dataset. Default: False.

Raises

ValueError – If split is not ‘train’, ‘test’ or ‘infer’.

Examples

>>> from mindvision.classification.dataset import Cifar100
>>> dataset = Cifar100("./data/", "train")
>>> dataset = dataset.run()

About CIFAR-100 dataset:

This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses.

Here is the original CIFAR-100 dataset structure. You can unzip the dataset files into the following directory structure and read them by MindSpore Vision’s API.

.
└── cifar-100-python
     ├── train
     ├── test
     ├── meta
     └── file.txt~

Citation:

@techreport{Krizhevsky09,
author       = {Alex Krizhevsky},
title        = {Learning multiple layers of features from tiny images},
institution  = {},
year         = {2009},
howpublished = {http://www.cs.toronto.edu/~kriz/cifar.html}
}
default_transform()[source]

Set the default transform for Cifar10 dataset.

download_dataset()[source]

Download the Cifar100 data if it doesn’t exist.

property index2label

Get the mapping of indexes and labels.

class mindvision.classification.dataset.FashionMnist(path: str, split: str = 'train', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, batch_size: int = 32, repeat_num: int = 1, shuffle: Optional[bool] = None, num_parallel_workers: int = 1, num_shards: Optional[int] = None, shard_id: Optional[int] = None, resize: Union[int, Tuple[int, int]] = 32, download: bool = False)[source]

A source dataset that downloads, reads, parses and augments the Fashion-MNIST dataset.

The generated dataset has two columns [image, label]. The tensor of column image is a matrix of the float32 type. The tensor of column label is a scalar of the int32 type.

Parameters
  • path (str) – The root directory of the Fashion-MNIST dataset or inference image.

  • split (str) – The dataset split, supports “train”, “test” or “infer”. Default: “train”.

  • transform (callable, optional) – A function transform that takes in a image. Default: None.

  • target_transform (callable, optional) – A function transform that takes in a label. Default: None.

  • batch_size (int) – The batch size of dataset. Default: 32.

  • repeat_num (int) – The repeat num of dataset. Default: 1.

  • shuffle (bool, optional) – Whether or not to perform shuffle on the dataset. Default:None.

  • num_parallel_workers (int, optional) – The number of subprocess used to fetch the dataset in parallel. Default: None.

  • num_shards (int, optional) – The number of shards that the dataset will be divided. Default: None.

  • shard_id (int, optional) – The shard ID within num_shards. Default: None.

  • resize (Union[int, tuple]) – The output size of the resized image. If size is an integer, the smaller edge of the image will be resized to this value with the same image aspect ratio. If size is a sequence of length 2, it should be (height, width). Default: 32.

  • download (bool) – Whether to download the dataset. Default: False.

Raises

ValueError – If split is not ‘train’, ‘test’ or ‘infer’.

Examples

>>> from mindvision.classification.dataset import FashionMnist
>>> dataset = FashionMnist("./data/fashion_mnist", "train")
>>> dataset = dataset.run()

About Fashion-MNIST dataset:

Fashion-MNIST is a dataset of Zalando’s article images that consists of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. Fashion-MNIST is served as a direct drop-in replacement for the original MNIST dataset and benchmarks. It shares the same image size and structure of training and testing splits.

You can unzip the dataset files into this directory structure and read them by MindSpore Vision’s API.

./fashion_mnist/
├── test
│   ├── t10k-images-idx3-ubyte
│   └── t10k-labels-idx1-ubyte
└── train
    ├── train-images-idx3-ubyte
    └── train-labels-idx1-ubyte

Citation:

@online{xiao2017/online,
  author       = {Han Xiao and Kashif Rasul and Roland Vollgraf},
  title        = {Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms},
  date         = {2017-08-28},
  year         = {2017},
  eprintclass  = {cs.LG},
  eprinttype   = {arXiv},
  eprint       = {cs.LG/1708.07747},
}
default_transform()[source]

Set the default transform for Fashion Mnist dataset.

download_dataset()[source]

Download the Fashion MNIST data if it doesn’t exist.

property index2label

Get the mapping of indexes and labels.

class mindvision.classification.dataset.ImageNet(path: str, split: str = 'train', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, batch_size: int = 64, resize: Union[Tuple[int, int], int] = 224, repeat_num: int = 1, shuffle: Optional[bool] = None, download: bool = False, num_parallel_workers: int = 1, num_shards: Optional[int] = None, shard_id: Optional[int] = None)[source]

A source dataset that reads, parses and augments the IMAGENET dataset.

The generated dataset has two columns [image, label]. The tensor of column image is a matrix of the float32 type. The tensor of column label is a scalar of the int32 type.

Parameters
  • path (str) – The root directory of the IMAGENET dataset or inference image.

  • split (str) – The dataset split, supports “train”, “val” or “infer”. Default: “train”.

  • num_parallel_workers (int, optional) – The number of subprocess used to fetch the dataset in parallel. Default: None.

  • transform (callable, optional) – A function transform that takes in a image. Default: None.

  • target_transform (callable, optional) – A function transform that takes in a label. Default: None.

  • batch_size (int) – The batch size of dataset. Default: 64.

  • repeat_num (int) – The repeat num of dataset. Default: 1.

  • shuffle (bool, optional) – Whether or not to perform shuffle on the dataset. Default: None.

  • num_shards (int, optional) – The number of shards that the dataset will be divided. Default: None.

  • shard_id (int, optional) – The shard ID within num_shards. Default: None.

  • resize (Union[int, tuple]) – The output size of the resized image. If size is an integer, the smaller edge of the image will be resized to this value with the same image aspect ratio. If size is a sequence of length 2, it should be (height, width). Default: 224.

  • download (bool) – Whether to download the dataset. Default: False.

Raises

ValueError – If split is not ‘train’, ‘test’ or ‘infer’.

Examples

>>> from mindvision.classification.dataset import ImageNet
>>> dataset = ImagenNet("./data/imagenet/", "train")
>>> dataset = dataset.run()

About IMAGENET dataset:

IMAGENET is an image dataset that spans 1000 object classes and contains 1,281,167 training images, 50,000 validation images and 100,000 test images. Images of each object are quality-controlled and human-annotated.

You can unzip the dataset files into this directory structure and read them by MindSpore Vision’s API.

.imagenet/
├── train/  (1000 directories and 1281167 images)
│  ├── n04347754/
│  │   ├── 000001.jpg
│  │   ├── 000002.jpg
│  │   └── ....
│  └── n04347756/
│      ├── 000001.jpg
│      ├── 000002.jpg
│      └── ....
└── val/   (1000 directories and 50000 images)
├── n04347754/
│   ├── 000001.jpg
│   ├── 000002.jpg
│   └── ....
└── n04347756/
    ├── 000001.jpg
    ├── 000002.jpg
    └── ....

Citation

@inproceedings{deng2009imagenet,
title        = {Imagenet: A large-scale hierarchical image database},
author       = {Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li},
booktitle    = {2009 IEEE conference on computer vision and pattern recognition},
pages        = {248--255},
year         = {2009},
organization = {IEEE}
}
default_transform()[source]

Set the default transform for ImageNet dataset.

property index2label

Get the mapping of indexes and labels.

read_dataset()[source]

Read each image and its corresponding label from directory.

class mindvision.classification.dataset.Mnist(path: str, split: str = 'train', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, batch_size: int = 32, repeat_num: int = 1, shuffle: Optional[bool] = None, num_parallel_workers: int = 1, num_shards: Optional[int] = None, shard_id: Optional[int] = None, resize: Union[int, Tuple[int, int]] = 32, download: bool = False)[source]

A source dataset that downloads, reads, parses and augments the MNIST dataset.

The generated dataset has two columns [image, label]. The tensor of column image is a matrix of the float32 type. The tensor of column label is a scalar of the int32 type.

Parameters
  • path (str) – The root directory of the MNIST dataset or inference image.

  • split (str) – The dataset split, supports “train”, “test” or “infer”. Default: “train”.

  • transform (callable, optional) – A function transform that takes in a image. Default: None.

  • target_transform (callable, optional) – A function transform that takes in a label. Default: None.

  • batch_size (int) – The batch size of dataset. Default: 32.

  • repeat_num (int) – The repeat num of dataset. Default: 1.

  • shuffle (bool, optional) – Whether or not to perform shuffle on the dataset. Default: None.

  • num_parallel_workers (int, optional) – The number of subprocess used to fetch the dataset in parallel. Default: None.

  • num_shards (int, optional) – The number of shards that the dataset will be divided. Default: None.

  • shard_id (int, optional) – The shard ID within num_shards. Default: None.

  • resize (Union[int, tuple]) – The output size of the resized image. If size is an integer, the smaller edge of the image will be resized to this value with the same image aspect ratio. If size is a sequence of length 2, it should be (height, width). Default: 32.

  • download (bool) – Whether to download the dataset. Default: False.

Examples

>>> from mindvision.classification.dataset import Mnist
>>> dataset = Mnist("./data/mnist", "train")
>>> dataset = dataset.run()

About MNIST dataset:

The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.

Here is the original MNIST dataset structure. You can unzip the dataset files into this directory structure and read them by MindSpore Vision’s API.

./mnist
├── test
│   ├── t10k-images-idx3-ubyte
│   └── t10k-labels-idx1-ubyte
└── train
    ├── train-images-idx3-ubyte
    └── train-labels-idx1-ubyte

Citation:

@article{lecun2010mnist,
title        = {MNIST handwritten digit database},
author       = {LeCun, Yann and Cortes, Corinna and Burges, CJ},
journal      = {ATT Labs [Online]},
volume       = {2},
year         = {2010},
howpublished = {http://yann.lecun.com/exdb/mnist}
}
default_transform()[source]

Set the default transform for Mnist dataset.

download_dataset()[source]

Download the MNIST data if it doesn’t exist.

property index2label

Get the mapping of indexes and labels.

class mindvision.classification.dataset.ParseCifar10(path: str)[source]

DownLoad and parse Cifar10 dataset.

Parameters

path (str) – The root path of the Cifar10 dataset join train or test.

Examples

>>> parse_data = ParseCifar10("./cifar10/train")
download_and_extract_archive()[source]

Download the Cifar10 dataset if it doesn’t exists.

parse_dataset()[source]

Parse data from Cifar10 dataset file.

class mindvision.classification.dataset.ParseCifar100(path: str)[source]

DownLoad and parse Cifar100 dataset.

Parameters

path (str) – The root path of the Cifar100 dataset join train or test.

Examples

>>> parse_data = ParseCifar100("./cifar100/train")
class mindvision.classification.dataset.ParseFashionMnist(path: str)[source]

DownLoad and parse FashionMnist dataset.

Parameters

path (str) – The root path of the FashionMnist dataset join train or test.

Examples

>>> parse_data = ParseFashionMnist("./fashion_mnist/train")
class mindvision.classification.dataset.ParseImageNet(path: str)[source]

Parse ImageNet dataset and generate the json file (file name:imagenet_meta.json). The ImageNet dataset looks like:

.imagenet/
├── ILSVRC2012_devkit_t12.tar.gz
├── ILSVRC2012_img_train.tar
└── ILSVRC2012_img_val.tar

or:

.imagenet/
├── ILSVRC2012_devkit_t12.tar.gz
├── train/
└── val/
Parameters

path (str) – The root path of ImageNet2012 dataset which must include ILSVRC2012_devkit_t12.tar.gz and train/val compressed package or directory.

Examples

>>> parse_data = ParseImageNet("./imagenet/")
parse_dataset()[source]

Parse the devkit archives of ImageNet dataset.

parse_devkit()[source]

Parse the devkit archive of the ImageNet2012 classification dataset and save meta info in json file.

class mindvision.classification.dataset.ParseMnist(path: str)[source]

DownLoad and parse Mnist dataset.

Parameters

path (str) – The root path of the Mnist dataset join train or test.

Examples

>>> parse_data = ParseMnist("./mnist/train")
download_and_extract_archive()[source]

Download the MNIST dataset if it doesn’t exists.

parse_dataset()[source]

Parse data from Mnist dataset file.

mindvision.classification.models

Init classification models.

class mindvision.classification.models.EfficientNet(width_mult: float = 1, depth_mult: float = 1, inverted_residual_setting: Optional[List[MBConvConfig]] = None, keep_prob: float = 0.2, block: Optional[nn.Cell] = None, norm_layer: Optional[nn.Cell] = None)[source]

EfficientNet architecture.

Parameters
  • width_mult (float) – The ratio of the channel. Default: 1.0.

  • depth_mult (float) – The ratio of num_layers. Default: 1.0.

  • inverted_residual_setting (List[MBConvConfig], optional) – The settings of block. Default: None.

  • keep_prob (float) – The dropout rate of MBConv. Default: 0.2.

  • block (nn.Cell, optional) – The basic block of the model. Default: None.

  • norm_layer (nn.Cell, optional) – The normalization layer. Default: None.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, 1280)\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>> import mindspore as ms
>>> from mindvision.classification.models.backbones import EfficientNet
>>> net = EfficientNet(1, 1)
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1280)

About EfficientNet:

EfficientNet systematically studys model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, The model proposes a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. This model demonstrates the effectiveness of this method on scaling up MobileNets and ResNet.

Citation:

@misc{tan2020efficientnet,
    title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
    author={Mingxing Tan and Quoc V. Le},
    year={2020},
    eprint={1905.11946},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
construct(x)[source]

Efficientnet construct.

class mindvision.classification.models.InvertedResidual(in_channel: int, out_channel: int, stride: int, expand_ratio: int, norm: Optional[nn.Cell] = None)[source]

Mobilenetv2 residual block definition.

Parameters
  • in_channel (int) – The input channel.

  • out_channel (int) – The output channel.

  • stride (int) – The Stride size for the first convolutional layer. Default: 1.

  • expand_ratio (int) – The expand ration of input channel.

  • norm (nn.Cell, optional) – The norm layer that will be stacked on top of the convoution layer. Default: None.

Returns

Tensor, output tensor.

Examples

>>> from mindvision.classification.models.backbones import InvertedResidual
>>> InvertedResidual(3, 256, 1, 1)
class mindvision.classification.models.LeNet5(num_classes=10, num_channel=1, include_top=True)[source]

LeNet backbone.

Parameters
  • num_class (int) – The number of classes. Default: 10.

  • num_channel (int) – The number of channels. Default: 1.

  • include_top (bool) – Whether to use the TOP architecture. Default: True.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\)

Outputs:

Tensor of shape \((N, 10)\)

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models.backbones import LeNet5
>>>
>>> net = LeNet5()
>>> x = ms.Tensor(np.ones([1, 1, 32, 32]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 10)

About LeNet5:

LeNet5 trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing.

Citation:

@article{1998Gradient,
  title={Gradient-based learning applied to document recognition},
  author={ Lecun, Y.  and  Bottou, L. },
  journal={Proceedings of the IEEE},
  volume={86},
  number={11},
  pages={2278-2324},
  year={1998}
}
construct(x)[source]

LeNet5 construct.

class mindvision.classification.models.MBConv(cnf: MBConvConfig, keep_prob: float, norm: Optional[nn.Cell] = None, se_layer: Callable[..., nn.Cell] = SqueezeExcite)[source]

MBConv Module.

Parameters
  • cnf (MBConvConfig) – The class which contains the parameters(in_channels, out_channels, nums_layers) and the functions which help calculate the parameters after multipling the expand_ratio.

  • keep_prob – The dropout rate in MBConv.

  • norm (nn.Cell) – The BatchNorm Method.Default: None.

  • se_layer (nn.Cell) – The squeeze-excite Module. Default: SqueezeExcite.

Returns

Tensor

Example

>>> from mindvision.classification.backbone import MBConvConfig
>>> cnf = MBConvConfig(1, 3, 1, 32, 16, 1)
>>> x = Tensor(np.ones(1, 2, 2, 2), mindspore.float32)
>>> MBConv(cnf, 0.2, None)(x)
construct(x)[source]

MBConv construct.

class mindvision.classification.models.MBConvConfig(expand_ratio: float, kernel_size: int, stride: int, in_chs: int, out_chs: int, num_layers: int, width_cnf: float, depth_cnf: float)[source]

The Parameters of MBConv which need to multiply the expand_ration.

Parameters
  • expand_ratio (float) – The Times of the num of out_channels with respect to in_channels.

  • kernel_size (int) – The kernel size of the depthwise conv.

  • stride (int) – The stride of the depthwise conv.

  • in_chs (int) – The input_channels of the MBConv Module.

  • out_chs (int) – The output_channels of the MBConv Module.

  • num_layers (int) – The num of MBConv Module.

  • width_cnf – The ratio of the channel.

  • depth_cnf – The ratio of num_layers.

Returns

None

Examples

>>> cnf = MBConvConfig(1, 3, 1, 32, 16, 1)
>>> print(cnf.input_channels)
static adjust_channels(channels: int, width_cnf: float, min_value: Optional[int] = None)[source]

Calculate the width of MBConv.

Parameters
  • channels (int) – The number of channel.

  • width_cnf (float) – The ratio of channel.

  • min_value (int, optional) – The minimum number of channel. Default: None.

Returns

int, the width of MBConv.

static adjust_depth(num_layers: int, depth_cnf: float)[source]

Calculate the depth of MBConv.

Parameters
  • num_layers (int) – The number of MBConv Module.

  • depth_cnf (float) – The ratio of num_layers.

Returns

int, the depth of MBConv.

class mindvision.classification.models.MobileNetV2(alpha: float = 1.0, inverted_residual_setting: Optional[List[List[int]]] = None, round_nearest: int = 8, block: Optional[nn.Cell] = None, norm: Optional[nn.Cell] = None)[source]

MobileNetV2 architecture.

Parameters
  • alpha (int) – The channels multiplier for round to 8/16 and others. Default: 1.0.

  • inverted_residual_setting (list, optional) – Inverted residual settings. Default: None.

  • round_nearest (int) – Round the number of channels in each layer to be a multiple of this number set to 1 to turn off rounding. Default is 8.

  • block (nn.Cell, optional) – Module specifying inverted residual building block for mobilenet. Default: None.

  • norm (nn.Cell, optional) – Norm layer that will be stacked on top of the convoution layer. Default: None.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, 1280, 7, 7)\)

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>> import mindspore as ms
>>> from mindvision.classification.models.backbones import MobileNetV2
>>> net = MobileNetV2()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1280, 7, 7)

About MobileNetV2:

The MobileNetV2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input an MobileNetV2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer.

Citation:

@article{2018MobileNetV2,
title={MobileNetV2: Inverted Residuals and Linear Bottlenecks},
author={ Sandler, M.  and  Howard, A.  and  Zhu, M.  and  Zhmoginov, A.  and  Chen, L. C. },
journal={IEEE},
year={2018},
}
class mindvision.classification.models.ResNet(block: Type[Union[ResidualBlockBase, ResidualBlock]], layer_nums: List[int], group: int = 1, base_width: int = 64, norm: Optional[nn.Cell] = None)[source]

ResNet architecture.

Parameters
  • block (Type[Union[ResidualBlockBase, ResidualBlock]]) – THe block for network.

  • layer_nums (list) – The numbers of block in different layers.

  • group (int) – The number of Group convolutions. Default: 1.

  • base_width (int) – The width of per group. Default: 64.

  • norm (nn.Cell, optional) – The module specifying the normalization layer to use. Default: None.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, 2048, 7, 7)\)

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>> import mindspore as ms
>>> from mindvision.classification.models.backbones import ResNet, ResidualBlock
>>> net = ResNet(ResidualBlock, [3, 4, 23, 3])
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 2048, 7, 7)

About ResNet:

The ResNet is to ease the training of networks that are substantially deeper than those used previously. The model explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.

Citation:

@article{2016Deep,
title={Deep Residual Learning for Image Recognition},
author={ He, K.  and  Zhang, X.  and  Ren, S.  and  Sun, J. },
journal={IEEE},
year={2016},
}
class mindvision.classification.models.ResNet101(**kwargs)[source]

The class of ResNet101 uses the registration mechanism to register, need to use the yaml configuration file to call.

class mindvision.classification.models.ResNet152(**kwargs)[source]

The class of ResNet152 uses the registration mechanism to register, need to use the yaml configuration file to call.

class mindvision.classification.models.ResNet18(**kwargs)[source]

The class of ResNet18 uses the registration mechanism to register, need to use the yaml configuration file to call.

class mindvision.classification.models.ResNet34(**kwargs)[source]

The class of ResNet34 uses the registration mechanism to register, need to use the yaml configuration file to call.

class mindvision.classification.models.ResNet50(**kwargs)[source]

The class of ResNet50 uses the registration mechanism to register, need to use the yaml configuration file to call.

class mindvision.classification.models.ResidualBlock(in_channel: int, out_channel: int, stride: int = 1, group: int = 1, base_width: int = 64, norm: Optional[nn.Cell] = None, down_sample: Optional[nn.Cell] = None)[source]

ResNet residual block definition.

Parameters
  • in_channel (int) – Input channel.

  • out_channel (int) – Output channel.

  • stride (int) – Stride size for the second convolutional layer. Default: 1.

  • group (int) – Group convolutions. Default: 1.

  • base_width (int) – Width of per group. Default: 64.

  • norm (nn.Cell, optional) – Module specifying the normalization layer to use. Default: None.

  • down_sample (nn.Cell, optional) – Downsample structure. Default: None.

Returns

Tensor, output tensor.

Examples

>>> from mindvision.classification.models.backbones import ResidualBlock
>>> ResidualBlock(3, 256, stride=2)
construct(x)[source]

ResidualBlock construct.

class mindvision.classification.models.ResidualBlockBase(in_channel: int, out_channel: int, stride: int = 1, group: int = 1, base_width: int = 64, norm: Optional[nn.Cell] = None, down_sample: Optional[nn.Cell] = None)[source]

ResNet residual block base definition.

Parameters
  • in_channel (int) – Input channel.

  • out_channel (int) – Output channel.

  • stride (int) – Stride size for the first convolutional layer. Default: 1.

  • group (int) – Group convolutions. Default: 1.

  • base_width (int) – Width of per group. Default: 64.

  • norm (nn.Cell, optional) – Module specifying the normalization layer to use. Default: None.

  • down_sample (nn.Cell, optional) – Downsample structure. Default: None.

Returns

Tensor, output tensor.

Examples

>>> ResidualBlockBase(3, 256, stride=2)
construct(x)[source]

ResidualBlockBase construct.

class mindvision.classification.models.ViT(image_size: int = 224, input_channels: int = 3, patch_size: int = 16, embed_dim: int = 768, num_layers: int = 12, num_heads: int = 12, mlp_dim: int = 3072, keep_prob: float = 1.0, attention_keep_prob: float = 1.0, drop_path_keep_prob: float = 1.0, activation: nn.Cell = nn.GELU, norm: Optional[nn.Cell] = nn.LayerNorm, pool: str = '')[source]

Vision Transformer architecture implementation.

Parameters
  • image_size (int) – Input image size. Default: 224.

  • input_channels (int) – The number of input channel. Default: 3.

  • patch_size (int) – Patch size of image. Default: 16.

  • embed_dim (int) – The dimension of embedding. Default: 768.

  • num_layers (int) – The depth of transformer. Default: 12.

  • num_heads (int) – The number of attention heads. Default: 12.

  • mlp_dim (int) – The dimension of MLP hidden layer. Default: 3072.

  • keep_prob (float) – The keep rate, greater than 0 and less equal than 1. Default: 1.0.

  • attention_keep_prob (float) – The keep rate for attention layer. Default: 1.0.

  • drop_path_keep_prob (float) – The keep rate for drop path. Default: 1.0.

  • activation (nn.Cell) – Activation function which will be stacked on top of the normalization layer (if not None), otherwise on top of the conv layer. Default: nn.GELU.

  • norm (nn.Cell, optional) – Norm layer that will be stacked on top of the convolution layer. Default: nn.LayerNorm.

  • pool (str) – The method of pooling. Default: ‘cls’.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, 768)\)

Raises

ValueError – If split is not ‘train’, “test or ‘infer’.

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>> import mindspore as ms
>>> from mindvision.classification.models.backbones import ViT
>>> net = ViT()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 768)

About ViT:

Vision Transformer (ViT) shows that a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

Citation:

@article{2020An,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, A. and Beyer, L. and Kolesnikov, A. and Weissenborn, D. and Houlsby, N.},
year={2020},
}
construct(x)[source]

ViT construct.

mindvision.classification.models.efficientnet_b0(num_classes: int = 1000, pretrained: bool = False)[source]

Constructs a EfficientNet B0 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.

Parameters
  • num_classes (int) – The numbers of classes. Default: 1000.

  • pretrained (bool) – If True, returns a model pre-trained on IMAGENET. Default: False.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import efficientnet_b0
>>>
>>> net = efficientnet_b0(1000, False)
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About EfficientNet:

EfficientNet systematically studys model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, The model proposes a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. This model demonstrates the effectiveness of this method on scaling up MobileNets and ResNet.

Citation:

@misc{tan2020efficientnet,
    title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
    author={Mingxing Tan and Quoc V. Le},
    year={2020},
    eprint={1905.11946},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
mindvision.classification.models.efficientnet_b1(num_classes: int = 1000, pretrained: bool = False)[source]

Constructs a EfficientNet B1 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.

Parameters
  • num_classes (int) – The numbers of classes. Default: 1000.

  • pretrained (bool) – If True, returns a model pre-trained on IMAGENET. Default: False.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import efficientnet_b1
>>>
>>> net = efficientnet_b1(1000, False)
>>> x = ms.Tensor(np.ones([1, 3, 240, 240]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About EfficientNet:

EfficientNet systematically studys model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, The model proposes a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. This model demonstrates the effectiveness of this method on scaling up MobileNets and ResNet.

Citation:

@misc{tan2020efficientnet,
    title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
    author={Mingxing Tan and Quoc V. Le},
    year={2020},
    eprint={1905.11946},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
mindvision.classification.models.efficientnet_b2(num_classes: int = 1000, pretrained: bool = False)[source]

Constructs a EfficientNet B2 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.

Parameters
  • num_classes (int) – The numbers of classes. Default: 1000.

  • pretrained (bool) – If True, returns a model pre-trained on IMAGENET. Default: False.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import efficientnet_b2
>>>
>>> net = efficientnet_b2(1000, False)
>>> x = ms.Tensor(np.ones([1, 3, 260, 260]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About EfficientNet:

EfficientNet systematically studys model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, The model proposes a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. This model demonstrates the effectiveness of this method on scaling up MobileNets and ResNet.

Citation:

@misc{tan2020efficientnet,
    title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
    author={Mingxing Tan and Quoc V. Le},
    year={2020},
    eprint={1905.11946},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
mindvision.classification.models.efficientnet_b3(num_classes: int = 1000, pretrained: bool = False)[source]

Constructs a EfficientNet B3 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.

Parameters
  • num_classes (int) – The numbers of classes. Default: 1000.

  • pretrained (bool) – If True, returns a model pre-trained on IMAGENET. Default: False.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import efficientnet_b3
>>>
>>> net = efficientnet_b3(1000, False)
>>> x = ms.Tensor(np.ones([1, 3, 300, 300]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About EfficientNet:

EfficientNet systematically studys model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, The model proposes a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. This model demonstrates the effectiveness of this method on scaling up MobileNets and ResNet.

Citation:

@misc{tan2020efficientnet,
    title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
    author={Mingxing Tan and Quoc V. Le},
    year={2020},
    eprint={1905.11946},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
mindvision.classification.models.efficientnet_b4(num_classes: int = 1000, pretrained: bool = False)[source]

Constructs a EfficientNet B4 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.

Parameters
  • num_classes (int) – The numbers of classes. Default: 1000.

  • pretrained (bool) – If True, returns a model pre-trained on IMAGENET. Default: False.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import efficientnet_b4
>>>
>>> net = efficientnet_b4(1000, False)
>>> x = ms.Tensor(np.ones([1, 3, 380, 380]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About EfficientNet:

EfficientNet systematically studys model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, The model proposes a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. This model demonstrates the effectiveness of this method on scaling up MobileNets and ResNet.

Citation:

@misc{tan2020efficientnet,
    title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
    author={Mingxing Tan and Quoc V. Le},
    year={2020},
    eprint={1905.11946},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
mindvision.classification.models.efficientnet_b5(num_classes: int = 1000, pretrained: bool = False)[source]

Constructs a EfficientNet B5 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.

Parameters
  • num_classes (int) – The numbers of classes. Default: 1000.

  • pretrained (bool) – If True, returns a model pre-trained on IMAGENET. Default: False.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import efficientnet_b5
>>>
>>> net = efficientnet_b5(1000, False)
>>> x = ms.Tensor(np.ones([1, 3, 456, 456]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About EfficientNet:

EfficientNet systematically studys model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, The model proposes a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. This model demonstrates the effectiveness of this method on scaling up MobileNets and ResNet.

Citation:

@misc{tan2020efficientnet,
    title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
    author={Mingxing Tan and Quoc V. Le},
    year={2020},
    eprint={1905.11946},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
mindvision.classification.models.efficientnet_b6(num_classes: int = 1000, pretrained: bool = False)[source]

Constructs a EfficientNet B6 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.

Parameters
  • num_classes (int) – The numbers of classes. Default: 1000.

  • pretrained (bool) – If True, returns a model pre-trained on IMAGENET. Default: False.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import efficientnet_b4
>>>
>>> net = efficientnet_b4(1000, False)
>>> x = ms.Tensor(np.ones([1, 3, 528, 528]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About EfficientNet:

EfficientNet systematically studys model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, The model proposes a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. This model demonstrates the effectiveness of this method on scaling up MobileNets and ResNet.

Citation:

@misc{tan2020efficientnet,
    title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
    author={Mingxing Tan and Quoc V. Le},
    year={2020},
    eprint={1905.11946},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
mindvision.classification.models.efficientnet_b7(num_classes: int = 1000, pretrained: bool = False)[source]

Constructs a EfficientNet B7 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.

Parameters
  • num_classes (int) – The numbers of classes. Default: 1000.

  • pretrained (bool) – If True, returns a model pre-trained on IMAGENET. Default: False.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import efficientnet_b7
>>>
>>> net = efficientnet_b7(1000, False)
>>> x = ms.Tensor(np.ones([1, 3, 600, 600]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About EfficientNet:

EfficientNet systematically studys model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, The model proposes a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. This model demonstrates the effectiveness of this method on scaling up MobileNets and ResNet.

Citation:

@misc{tan2020efficientnet,
    title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
    author={Mingxing Tan and Quoc V. Le},
    year={2020},
    eprint={1905.11946},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
mindvision.classification.models.lenet(num_classes: int = 10, num_channel: int = 1, pretrained: bool = False, include_top: bool = True, ckpt_file: Optional[str] = None)[source]

Constructs a LeNet architecture from Gradient-based learning applied to document recognition.

Parameters
  • num_classes (int) – The numbers of classes. Default: 1000.

  • num_channel (int) – The numbers of channels. Default: 1.

  • pretrained (bool) – If True, returns a model pre-trained on IMAGENET. Default: False.

  • include_top (bool) – Whether to use the TOP architecture. Default: True.

  • ckpt_file (str, optional) – The path of checkpoint files. Default: None.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\)

Outputs:

Tensor of shape \((N, CLASSES_{out})\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import lenet
>>>
>>> net = lenet5()
>>> x = ms.Tensor(np.ones([1, 1, 32, 32]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 10)

About LeNet5:

LeNet5 trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing.

Citation:

@article{1998Gradient,
  title={Gradient-based learning applied to document recognition},
  author={ Lecun, Y.  and  Bottou, L. },
  journal={Proceedings of the IEEE},
  volume={86},
  number={11},
  pages={2278-2324},
  year={1998}
}
mindvision.classification.models.mobilenet_v2(num_classes: int = 1001, alpha: float = 1.0, round_nearest: int = 8, pretrained: bool = False, resize: int = 224, block: Optional[nn.Cell] = None, norm: Optional[nn.Cell] = None)[source]

Constructs a MobileNetV2 architecture from MobileNetV2: Inverted Residuals and Linear Bottlenecks.

Parameters
  • num_classes (int) – The numbers of classes. Default: 1000.

  • alpha (float) – The channels multiplier for round to 8/16 and others. Default: 1.0.

  • round_nearest (int) – Round the number of channels in each layer to be a multiple of this number set to 1 to turn off rounding. Default is 8.

  • pretrained (bool) – If True, returns a model pre-trained on IMAGENET. Default: False.

  • resize (int) – The output size of the resized image. Default: 224.

  • block (nn.Cell, optional) – Module specifying inverted residual building block for mobilenet. Default: None.

  • norm (nn.Cell, optional) – Norm layer that will be stacked on top of the convoution layer. Default: None.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import mobilenet_v2
>>>
>>> net = mobilenet_v2()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1001)

About MobileNetV2:

The MobileNetV2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input an MobileNetV2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer.

Citation:

@article{2018MobileNetV2,
title={MobileNetV2: Inverted Residuals and Linear Bottlenecks},
author={ Sandler, M.  and  Howard, A.  and  Zhu, M.  and  Zhmoginov, A.  and  Chen, L. C. },
journal={IEEE},
year={2018},
}
mindvision.classification.models.resnet101(num_classes: int = 1000, pretrained: bool = False, group: int = 1, base_width: int = 64, norm: Optional[nn.Cell] = None)[source]

Constructs a ResNet-101 architecture from Deep Residual Learning for Image Recognition.

Parameters
  • num_classes (int) – The number of classification. Default: 1000.

  • pretrained (bool) – Whether to download and load the pre-trained model. Default: False.

  • group (int) – The number of group convolutions. Default: 1.

  • base_width (int) – The width of per group. Default: 64.

  • norm (nn.Cell, optional) – The module specifying the normalization layer to use. Default: None.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import resnet101
>>>
>>> net = resnet101()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About ResNet:

The ResNet is to ease the training of networks that are substantially deeper than those used previously. The model explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.

Citation:

@article{2016Deep,
title={Deep Residual Learning for Image Recognition},
author={ He, K.  and  Zhang, X.  and  Ren, S.  and  Sun, J. },
journal={IEEE},
year={2016},
}
mindvision.classification.models.resnet152(num_classes: int = 1000, pretrained: bool = False, group: int = 1, base_width: int = 64, norm: Optional[nn.Cell] = None)[source]

Constructs a ResNet-152 architecture from Deep Residual Learning for Image Recognition.

Parameters
  • num_classes (int) – The number of classification. Default: 1000.

  • pretrained (bool) – Whether to download and load the pre-trained model. Default: False.

  • group (int) – The number of group convolutions. Default: 1.

  • base_width (int) – The width of per group. Default: 64.

  • norm (nn.Cell, optional) – The module specifying the normalization layer to use. Default: None.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import resnet152
>>>
>>> net = resnet152()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About ResNet:

The ResNet is to ease the training of networks that are substantially deeper than those used previously. The model explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.

Citation:

@article{2016Deep,
title={Deep Residual Learning for Image Recognition},
author={ He, K.  and  Zhang, X.  and  Ren, S.  and  Sun, J. },
journal={IEEE},
year={2016},
}
mindvision.classification.models.resnet18(num_classes: int = 1000, pretrained: bool = False, group: int = 1, base_width: int = 64, norm: Optional[nn.Cell] = None)[source]

Constructs a ResNet-18 architecture from Deep Residual Learning for Image Recognition.

Parameters
  • num_classes (int) – The number of classification. Default: 1000.

  • pretrained (bool) – Whether to download and load the pre-trained model. Default: False.

  • group (int) – The number of group convolutions. Default: 1.

  • base_width (int) – The width of per group. Default: 64.

  • norm (nn.Cell, optional) – The module specifying the normalization layer to use. Default: None.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import resnet18
>>>
>>> net = resnet18()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About ResNet:

The ResNet is to ease the training of networks that are substantially deeper than those used previously. The model explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.

Citation:

@article{2016Deep,
title={Deep Residual Learning for Image Recognition},
author={ He, K.  and  Zhang, X.  and  Ren, S.  and  Sun, J. },
journal={IEEE},
year={2016},
}
mindvision.classification.models.resnet34(num_classes: int = 1000, pretrained: bool = False, group: int = 1, base_width: int = 64, norm: Optional[nn.Cell] = None)[source]

Constructs a ResNet-34 architecture from Deep Residual Learning for Image Recognition.

Parameters
  • num_classes (int) – The number of classification. Default: 1000.

  • pretrained (bool) – Whether to download and load the pre-trained model. Default: False.

  • group (int) – The number of group convolutions. Default: 1.

  • base_width (int) – The width of per group. Default: 64.

  • norm (nn.Cell, optional) – The module specifying the normalization layer to use. Default: None.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import resnet34
>>>
>>> net = resnet34()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About ResNet:

The ResNet is to ease the training of networks that are substantially deeper than those used previously. The model explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.

Citation:

@article{2016Deep,
title={Deep Residual Learning for Image Recognition},
author={ He, K.  and  Zhang, X.  and  Ren, S.  and  Sun, J. },
journal={IEEE},
year={2016},
}
mindvision.classification.models.resnet50(num_classes: int = 1000, pretrained: bool = False, group: int = 1, base_width: int = 64, norm: Optional[nn.Cell] = None)[source]

Constructs a ResNet-50 architecture from Deep Residual Learning for Image Recognition.

Parameters
  • num_classes (int) – The number of classification. Default: 1000.

  • pretrained (bool) – Whether to download and load the pre-trained model. Default: False.

  • group (int) – The number of group convolutions. Default: 1.

  • base_width (int) – The width of per group. Default: 64.

  • norm (nn.Cell, optional) – The module specifying the normalization layer to use. Default: None.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\).

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import resnet50
>>>
>>> net = resnet50()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About ResNet:

The ResNet is to ease the training of networks that are substantially deeper than those used previously. The model explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.

Citation:

@article{2016Deep,
title={Deep Residual Learning for Image Recognition},
author={ He, K.  and  Zhang, X.  and  Ren, S.  and  Sun, J. },
journal={IEEE},
year={2016},
}
mindvision.classification.models.vit_b_16(num_classes: int = 1000, image_size: int = 224, has_logits: bool = False, pretrained: bool = False, drop_out: float = 0.0, attention_dropout: float = 0.0, drop_path_dropout: float = 0.0)[source]

Constructs a vit_b_16 architecture from An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Parameters
  • image_size (int) – The input image size. Default: 224 for ImageNet.

  • num_classes (int) – The number of classification. Default: 1000.

  • has_logits (bool) – Whether has logits or not. Default: False.

  • pretrained (bool) – Whether to download and load the pre-trained model. Default: False.

  • drop_out (float) – The drop out rate. Default: 0.0.

  • attention_dropout (float) – The attention dropout rate. Default: 0.0.

  • drop_path_dropout (float) – The stochastic depth rate. Default: 0.0.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\)

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import vit_b_16
>>>
>>> net = vit_b_16()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About ViT:

Vision Transformer (ViT) shows that a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

Citation:

@article{2020An,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, A. and Beyer, L. and Kolesnikov, A. and Weissenborn, D. and Houlsby, N.},
year={2020},
}
mindvision.classification.models.vit_b_32(num_classes: int = 1000, image_size: int = 224, has_logits: bool = False, pretrained: bool = False, drop_out: float = 0.0, attention_dropout: float = 0.0, drop_path_dropout: float = 0.0)[source]

Constructs a vit_b_32 architecture from An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Parameters
  • image_size (int) – The input image size. Default: 224 for ImageNet.

  • num_classes (int) – The number of classification. Default: 1000.

  • has_logits (bool) – Whether has logits or not. Default: False.

  • pretrained (bool) – Whether to download and load the pre-trained model. Default: False.

  • drop_out (float) – The drop out rate. Default: 0.0.

  • attention_dropout (float) – The attention dropout rate. Default: 0.0.

  • drop_path_dropout (float) – The stochastic depth rate. Default: 0.0.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\)

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import vit_b_32
>>>
>>> net = vit_b_32()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About ViT:

Vision Transformer (ViT) shows that a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

Citation:

@article{2020An,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, A. and Beyer, L. and Kolesnikov, A. and Weissenborn, D. and Houlsby, N.},
year={2020},
}
mindvision.classification.models.vit_l_16(num_classes: int = 1000, image_size: int = 224, has_logits: bool = False, pretrained: bool = False, drop_out: float = 0.0, attention_dropout: float = 0.0, drop_path_dropout: float = 0.0)[source]

Constructs a vit_l_16 architecture from An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Parameters
  • image_size (int) – The input image size. Default: 224 for ImageNet.

  • num_classes (int) – The number of classification. Default: 1000.

  • has_logits (bool) – Whether has logits or not. Default: False.

  • pretrained (bool) – Whether to download and load the pre-trained model. Default: False.

  • drop_out (float) – The drop out rate. Default: 0.0.

  • attention_dropout (float) – The attention dropout rate. Default: 0.0.

  • drop_path_dropout (float) – The stochastic depth rate. Default: 0.0.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\)

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import vit_l_16
>>>
>>> net = vit_l_16()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About ViT:

Vision Transformer (ViT) shows that a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

Citation:

@article{2020An,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, A. and Beyer, L. and Kolesnikov, A. and Weissenborn, D. and Houlsby, N.},
year={2020},
}
mindvision.classification.models.vit_l_32(num_classes: int = 1000, image_size: int = 224, has_logits: bool = False, pretrained: bool = False, drop_out: float = 0.0, attention_dropout: float = 0.0, drop_path_dropout: float = 0.0)[source]

Constructs a vit_l_32 architecture from An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Parameters
  • image_size (int) – The input image size. Default: 224 for ImageNet.

  • num_classes (int) – The number of classification. Default: 1000.

  • has_logits (bool) – Whether has logits or not. Default: False.

  • pretrained (bool) – Whether to download and load the pre-trained model. Default: False.

  • drop_out (float) – The drop out rate. Default: 0.0.

  • attention_dropout (float) – The attention dropout rate. Default: 0.0.

  • drop_path_dropout (float) – The stochastic depth rate. Default: 0.0.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, CLASSES_{out})\)

Supported Platforms:

GPU

Examples

>>> import numpy as np
>>>
>>> import mindspore as ms
>>> from mindvision.classification.models import vit_l_32
>>>
>>> net = vit_l_32()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)

About ViT:

Vision Transformer (ViT) shows that a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

Citation:

@article{2020An,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, A. and Beyer, L. and Kolesnikov, A. and Weissenborn, D. and Houlsby, N.},
year={2020},
}