mindspore.compression

mindspore.compression.quant

Quantization module, including base class of the quantizer, the quantization aware training algorithm, and quantization utils.

class mindspore.compression.quant.OptimizeOption(value)[source]: An enum for the model quantization optimize option, currently only support QAT and LEARNED_SCALE.

class mindspore.compression.quant.QuantizationAwareTraining(bn_fold=True, freeze_bn=10000000, quant_delay=(0, 0), quant_dtype=(QuantDtype.INT8, QuantDtype.INT8), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False), optimize_option=OptimizeOption.QAT, one_conv_fold=True)[source]

Quantizer for quantization aware training.

Parameters

bn_fold (bool) – Whether to use bn fold ops for simulation inference operation. Default: True.
freeze_bn (int) – Number of steps after which BatchNorm OP parameters fixed to global mean and variance. Default: 1e7.
quant_delay (Union[int, list, tuple]) – Number of steps after which weights and activations are quantized during train and eval. The first element represents weights and the second element represents data flow. Default: (0, 0).
quant_dtype (Union[QuantDtype, list, tuple]) – Datatype used to quantize weights and activations. The first element represents weights and the second element represents data flow. It is necessary to consider the precision support of hardware devices in the practical quantization infer scenario. Default: (QuantDtype.INT8, QuantDtype.INT8).
per_channel (Union[bool, list, tuple]) – Quantization granularity based on layer or on channel. If True then base on per channel, otherwise base on per layer. The first element represents weights and the second element represents data flow, and the second element must be False now. Default: (False, False).
symmetric (Union[bool, list, tuple]) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric, otherwise base on asymmetric. The first element represents weights and the second element represents data flow. Default: (False, False).
narrow_range (Union[bool, list, tuple]) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False).
optimize_option (Union[OptimizeOption, list, tuple]) – Specifies the quant algorithm and options, currently only support QAT and LEARNED_SCALE (Note that, if both QAT and LEARNED_SCALE are configured, LEARNED_SCALE has a higher priority. LEARNED_SCALE currently only work under some constraints, which includes: freeze_bn=0, quant_delay=0, symmetric=True, narrow_range=True, More specifically, for operators such as Relu and Relu6, which only have positive values, we add a negative truncation to optimize this scenario, and narrow_range will automatically match to False). Default: OptimizeOption.QAT.
one_conv_fold (bool) – Whether to use one conv bn fold ops for simulation inference operation. Default: True.

Raises

TypeError – If the element of quant_delay or freeze_bn is not int.
TypeError – If bn_fold, one_conv_fold or the element of per_channel, symmetric, narrow_range is not bool.
TypeError – If the element of quant_dtype is not QuantDtype.
ValueError – If the length of quant_delay, quant_dtype, per_channel, symmetric or narrow_range is not less than 2.
ValueError – If the optimize_option is LEARNED_SCALE and freeze_bn is not equal to 0.
ValueError – If the optimize_option is LEARNED_SCALE and symmetric is not (True, True).
ValueError – If the optimize_option is LEARNED_SCALE and narrow_range is not (True, True).
ValueError – If the optimize_option is LEARNED_SCALE and quant_delay is not (0, 0).

Examples

>>> from mindspore.compression.quant import QuantizationAwareTraining
>>> class LeNet5(nn.Cell):
...     def __init__(self, num_class=10, channel=1):
...         super(LeNet5, self).__init__()
...         self.type = "fusion"
...         self.num_class = num_class
...
...         # change `nn.Conv2d` to `nn.Conv2dBnAct`
...         self.conv1 = nn.Conv2dBnAct(channel, 6, 5, pad_mode='valid', activation='relu')
...         self.conv2 = nn.Conv2dBnAct(6, 16, 5, pad_mode='valid', activation='relu')
...         # change `nn.Dense` to `nn.DenseBnAct`
...         self.fc1 = nn.DenseBnAct(16 * 5 * 5, 120, activation='relu')
...         self.fc2 = nn.DenseBnAct(120, 84, activation='relu')
...         self.fc3 = nn.DenseBnAct(84, self.num_class)
...
...         self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
...         self.flatten = nn.Flatten()
...
...     def construct(self, x):
...         x = self.conv1(x)
...         x = self.max_pool2d(x)
...         x = self.conv2(x)
...         x = self.max_pool2d(x)
...         x = self.flatten(x)
...         x = self.fc1(x)
...         x = self.fc2(x)
...         x = self.fc3(x)
...         return x
...
>>> net = LeNet5()
>>> quantizer = QuantizationAwareTraining(bn_fold=False, per_channel=[True, False], symmetric=[True, False])
>>> net_qat = quantizer.quantize(net)

quantize(network)[source]

Quant API to convert input network to a quantization aware training network.

Note

Please refer to the Examples of class: mindspore.compression.quant.QuantizationAwareTraining.

Parameters: network (Cell) – network to be quantized.
Returns: Cell, a quantization aware training network.
Raises: KeyError – If the device_target set in context is not in support_device.

mindspore.compression.quant.create_quant_config(quant_observer=(nn.FakeQuantWithMinMaxObserver, nn.FakeQuantWithMinMaxObserver), quant_delay=(0, 0), quant_dtype=(QuantDtype.INT8, QuantDtype.INT8), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False), mode='DEFAULT')[source]

Config the observer type of weights and data flow with quant parameters.

Parameters

quant_observer (Union[Observer, list, tuple]) – The types of observer for quantization. The first element applies to weights and the second applies to data flow. Currently, only FakeQuantWithMinMaxObserver supported. Default: (nn.FakeQuantWithMinMaxObserver, nn.FakeQuantWithMinMaxObserver).
quant_delay (Union[int, list, tuple]) – Number of steps after which weights and activations are quantized during train and eval. The first element represents weights and the second element represents data flow. Default: (0, 0).
quant_dtype (Union[QuantDtype, list, tuple]) – Datatype used to quantize weights and activations. The first element represents weights and the second element represents data flow. Default: (QuantDtype.INT8, QuantDtype.INT8).
per_channel (Union[bool, list, tuple]) – Quantization granularity based on layer or on channel. If True then base on per channel, otherwise base on per layer. The first element represents weights and the second element represents data flow, and the second element must be False now. Default: (False, False).
symmetric (Union[bool, list, tuple]) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric, otherwise base on asymmetric. The first element represents weights and the second element represents data flow. Default: (False, False).
narrow_range (Union[bool, list, tuple]) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False).
mode (str) – Optional quantization mode, currently only DEFAULT`(QAT) and `LEARNED_SCALE are supported. Default: (“DEFAULT”).

Returns

QuantConfig, contains the observer type of weight and activation.

Raises

ValueError – If the second element of per_channel is not False.

mindspore.compression.quant.load_nonquant_param_into_quant_net(quant_model, params_dict, quant_new_params=None)[source]

Load fp32 model parameters into quantization model.

Parameters

quant_model (Cell) – Quantization model.
params_dict (dict) – Parameter dict that stores fp32 parameters.
quant_new_params (list) – Parameters that exist in quantization network but not in non-quantization network. Default: None.

Raises

TypeError – If quant_new_params is not None and is not list.
ValueError – If there are parameters in the quant_model that are neither in params_dict nor in quant_new_params.

mindspore.compression.quant.query_quant_layers(network)[source]

Query the network’s quantization strategy of each quantized layer and print it to the screen, note that all the quantization layers are queried before graph compile optimization in the graph mode, thus, some redundant quantized layers, which not exist in practical execution, may appear.

Parameters: network (Cell) – input network