mindspore.compression

mindspore.compression.quant

Quantization module, including base class of the quantizer, the quantization aware training algorithm, and quantization utils.

class mindspore.compression.quant.OptimizeOption(value)[source]

An enum for the model quantization optimize option, currently only support QAT and LEARNED_SCALE.

class mindspore.compression.quant.QuantizationAwareTraining(bn_fold=True, freeze_bn=10000000, quant_delay=(0, 0), quant_dtype=(QuantDtype.INT8, QuantDtype.INT8), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False), optimize_option=OptimizeOption.QAT, one_conv_fold=True)[source]

Quantizer for quantization aware training.

Parameters
  • bn_fold (bool) – Whether to use bn fold ops for simulation inference operation. Default: True.

  • freeze_bn (int) – Number of steps after which BatchNorm OP parameters fixed to global mean and variance. Default: 1e7.

  • quant_delay (Union[int, list, tuple]) – Number of steps after which weights and activations are quantized during train and eval. The first element represents weights and the second element represents data flow. Default: (0, 0).

  • quant_dtype (Union[QuantDtype, list, tuple]) – Datatype used to quantize weights and activations. The first element represents weights and the second element represents data flow. It is necessary to consider the precision support of hardware devices in the practical quantization infer scenario. Default: (QuantDtype.INT8, QuantDtype.INT8).

  • per_channel (Union[bool, list, tuple]) – Quantization granularity based on layer or on channel. If True then base on per channel, otherwise base on per layer. The first element represents weights and the second element represents data flow, and the second element must be False now. Default: (False, False).

  • symmetric (Union[bool, list, tuple]) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric, otherwise base on asymmetric. The first element represents weights and the second element represents data flow. Default: (False, False).

  • narrow_range (Union[bool, list, tuple]) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False).

  • optimize_option (Union[OptimizeOption, list, tuple]) – Specifies the quant algorithm and options, currently only support QAT and LEARNED_SCALE (Note that, if both QAT and LEARNED_SCALE are configured, LEARNED_SCALE has a higher priority. LEARNED_SCALE currently only work under some constraints, which includes: freeze_bn=0, quant_delay=0, symmetric=True, narrow_range=True, More specifically, for operators such as Relu and Relu6, which only have positive values, we add a negative truncation to optimize this scenario, and narrow_range will automatically match to False). Default: OptimizeOption.QAT.

  • one_conv_fold (bool) – Whether to use one conv bn fold ops for simulation inference operation. Default: True.

Raises
  • TypeError – If the element of quant_delay or freeze_bn is not int.

  • TypeError – If bn_fold, one_conv_fold or the element of per_channel, symmetric, narrow_range is not bool.

  • TypeError – If the element of quant_dtype is not QuantDtype.

  • ValueError – If the length of quant_delay, quant_dtype, per_channel, symmetric or narrow_range is not less than 2.

  • ValueError – If the optimize_option is LEARNED_SCALE and freeze_bn is not equal to 0.

  • ValueError – If the optimize_option is LEARNED_SCALE and symmetric is not (True, True).

  • ValueError – If the optimize_option is LEARNED_SCALE and narrow_range is not (True, True).

  • ValueError – If the optimize_option is LEARNED_SCALE and quant_delay is not (0, 0).

Examples

>>> from mindspore.compression.quant import QuantizationAwareTraining
>>> class LeNet5(nn.Cell):
...     def __init__(self, num_class=10, channel=1):
...         super(LeNet5, self).__init__()
...         self.type = "fusion"
...         self.num_class = num_class
...
...         # change `nn.Conv2d` to `nn.Conv2dBnAct`
...         self.conv1 = nn.Conv2dBnAct(channel, 6, 5, pad_mode='valid', activation='relu')
...         self.conv2 = nn.Conv2dBnAct(6, 16, 5, pad_mode='valid', activation='relu')
...         # change `nn.Dense` to `nn.DenseBnAct`
...         self.fc1 = nn.DenseBnAct(16 * 5 * 5, 120, activation='relu')
...         self.fc2 = nn.DenseBnAct(120, 84, activation='relu')
...         self.fc3 = nn.DenseBnAct(84, self.num_class)
...
...         self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
...         self.flatten = nn.Flatten()
...
...     def construct(self, x):
...         x = self.conv1(x)
...         x = self.max_pool2d(x)
...         x = self.conv2(x)
...         x = self.max_pool2d(x)
...         x = self.flatten(x)
...         x = self.fc1(x)
...         x = self.fc2(x)
...         x = self.fc3(x)
...         return x
...
>>> net = LeNet5()
>>> quantizer = QuantizationAwareTraining(bn_fold=False, per_channel=[True, False], symmetric=[True, False])
>>> net_qat = quantizer.quantize(net)
quantize(network)[source]

Quant API to convert input network to a quantization aware training network.

Note

Please refer to the Examples of class: mindspore.compression.quant.QuantizationAwareTraining.

Parameters

network (Cell) – network to be quantized.

Returns

Cell, a quantization aware training network.

Raises

KeyError – If the device_target set in context is not in support_device.

mindspore.compression.quant.create_quant_config(quant_observer=(nn.FakeQuantWithMinMaxObserver, nn.FakeQuantWithMinMaxObserver), quant_delay=(0, 0), quant_dtype=(QuantDtype.INT8, QuantDtype.INT8), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False), mode='DEFAULT')[source]

Config the observer type of weights and data flow with quant parameters.

Parameters
  • quant_observer (Union[Observer, list, tuple]) – The types of observer for quantization. The first element applies to weights and the second applies to data flow. Currently, only FakeQuantWithMinMaxObserver supported. Default: (nn.FakeQuantWithMinMaxObserver, nn.FakeQuantWithMinMaxObserver).

  • quant_delay (Union[int, list, tuple]) – Number of steps after which weights and activations are quantized during train and eval. The first element represents weights and the second element represents data flow. Default: (0, 0).

  • quant_dtype (Union[QuantDtype, list, tuple]) – Datatype used to quantize weights and activations. The first element represents weights and the second element represents data flow. Default: (QuantDtype.INT8, QuantDtype.INT8).

  • per_channel (Union[bool, list, tuple]) – Quantization granularity based on layer or on channel. If True then base on per channel, otherwise base on per layer. The first element represents weights and the second element represents data flow, and the second element must be False now. Default: (False, False).

  • symmetric (Union[bool, list, tuple]) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric, otherwise base on asymmetric. The first element represents weights and the second element represents data flow. Default: (False, False).

  • narrow_range (Union[bool, list, tuple]) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False).

  • mode (str) – Optional quantization mode, currently only DEFAULT`(QAT) and `LEARNED_SCALE are supported. Default: (“DEFAULT”).

Returns

QuantConfig, contains the observer type of weight and activation.

Raises

ValueError – If the second element of per_channel is not False.

mindspore.compression.quant.load_nonquant_param_into_quant_net(quant_model, params_dict, quant_new_params=None)[source]

Load fp32 model parameters into quantization model.

Parameters
  • quant_model (Cell) – Quantization model.

  • params_dict (dict) – Parameter dict that stores fp32 parameters.

  • quant_new_params (list) – Parameters that exist in quantization network but not in non-quantization network. Default: None.

Raises
  • TypeError – If quant_new_params is not None and is not list.

  • ValueError – If there are parameters in the quant_model that are neither in params_dict nor in quant_new_params.

mindspore.compression.quant.query_quant_layers(network)[source]

Query the network’s quantization strategy of each quantized layer and print it to the screen, note that all the quantization layers are queried before graph compile optimization in the graph mode, thus, some redundant quantized layers, which not exist in practical execution, may appear.

Parameters

network (Cell) – input network

mindspore.compression.common

Common module for various compression algorithms, now only including datatype definition for quantization.

class mindspore.compression.common.QuantDtype(value)[source]

An enum for quant datatype, contains INT2 ~ INT8, UINT2 ~ UINT8.

static is_signed(dtype)[source]

Get whether the quant datatype is signed.

Parameters

dtype (QuantDtype) – quant datatype.

Returns

bool, whether the input quant datatype is signed.

Examples

>>> quant_dtype = QuantDtype.INT8
>>> is_signed = QuantDtype.is_signed(quant_dtype)
num_bits

Get the num bits of the QuantDtype member.

Returns

int, the num bits of the QuantDtype member.

Examples

>>> from mindspore.compression.common import QuantDtype
>>> quant_dtype = QuantDtype.INT8
>>> num_bits = quant_dtype.num_bits
>>> print(num_bits)
8
static switch_signed(dtype)[source]

Switch the signed state of the input quant datatype.

Parameters

dtype (QuantDtype) – quant datatype.

Returns

QuantDtype, quant datatype with opposite signed state as the input.

Examples

>>> quant_dtype = QuantDtype.INT8
>>> quant_dtype = QuantDtype.switch_signed(quant_dtype)