mindspore.compression

mindspore.compression.quant

Compression quant module.

class mindspore.compression.quant.OptimizeOption(value)[source]: An enum for the model quantization optimize option, currently only support QAT.

class mindspore.compression.quant.QuantizationAwareTraining(bn_fold=True, freeze_bn=10000000, quant_delay=(0, 0), quant_dtype=(<QuantDtype.INT8: 'INT8'>, <QuantDtype.INT8: 'INT8'>), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False), optimize_option=OptimizeOption.QAT, one_conv_fold=True)[source]

Quantizer for quantization aware training.

Parameters

bn_fold (bool) – Flag to used bn fold ops for simulation inference operation. Default: True.
freeze_bn (int) – Number of steps after which BatchNorm OP parameters used total mean and variance. Default: 1e7.
quant_delay (int, list or tuple) – Number of steps after which weights and activations are quantized during eval. The first element represent weights and second element represent data flow. Default: (0, 0)
quant_dtype (QuantDtype, list or tuple) – Datatype to use for quantize weights and activations. The first element represent weights and second element represent data flow. Default: (QuantDtype.INT8, QuantDtype.INT8)
per_channel (bool, list or tuple) – Quantization granularity based on layer or on channel. If True then base on per channel otherwise base on per layer. The first element represent weights and second element represent data flow. Default: (False, False)
symmetric (bool, list or tuple) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric otherwise base on asymmetric. The first element represent weights and second element represent data flow. Default: (False, False)
narrow_range (bool, list or tuple) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False)
optimize_option (OptimizeOption, list or tuple) – Specifies the quant algorithm and options, currently only support QAT. Default: OptimizeOption.QAT
one_conv_fold (bool) – Flag to used one conv bn fold ops for simulation inference operation. Default: True.

Examples

>>> class LeNet5(nn.Cell):
...     def __init__(self, num_class=10, channel=1):
...         super(LeNet5, self).__init__()
...         self.type = "fusion"
...         self.num_class = num_class
...
...         # change `nn.Conv2d` to `nn.Conv2dBnAct`
...         self.conv1 = nn.Conv2dBnAct(channel, 6, 5, pad_mode='valid', activation='relu')
...         self.conv2 = nn.Conv2dBnAct(6, 16, 5, pad_mode='valid', activation='relu')
...         # change `nn.Dense` to `nn.DenseBnAct`
...         self.fc1 = nn.DenseBnAct(16 * 5 * 5, 120, activation='relu')
...         self.fc2 = nn.DenseBnAct(120, 84, activation='relu')
...         self.fc3 = nn.DenseBnAct(84, self.num_class)
...
...         self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
...         self.flatten = nn.Flatten()
...
...     def construct(self, x):
...         x = self.conv1(x)
...         x = self.max_pool2d(x)
...         x = self.conv2(x)
...         x = self.max_pool2d(x)
...         x = self.flatten(x)
...         x = self.fc1(x)
...         x = self.fc2(x)
...         x = self.fc3(x)
...         return x
...
>>> net = LeNet5()
>>> quantizer = QuantizationAwareTraining(bn_fold=False, per_channel=[True, False], symmetric=[True, False])
>>> net_qat = quantizer.quantize(net)

quantize(network)[source]

Quant API to convert input network to a quantization aware training network

Parameters: network (Cell) – network to be quantized.

Examples

>>> net = Net()
>>> quantizer = QuantizationAwareTraining()
>>> net_qat = quantizer.quantize(net)

mindspore.compression.quant.create_quant_config(quant_observer=(<class 'mindspore.nn.layer.quant.FakeQuantWithMinMaxObserver'>, <class 'mindspore.nn.layer.quant.FakeQuantWithMinMaxObserver'>), quant_delay=(0, 0), quant_dtype=(<QuantDtype.INT8: 'INT8'>, <QuantDtype.INT8: 'INT8'>), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False))[source]

Configs the observer type of weights and data flow with quant params.

Parameters

quant_observer (Observer, list or tuple) – The observer type to do quantization. The first element represent weights and second element represent data flow. Default: (nn.FakeQuantWithMinMaxObserver, nn.FakeQuantWithMinMaxObserver)
quant_delay (int, list or tuple) – Number of steps after which weights and activations are quantized during eval. The first element represent weights and second element represent data flow. Default: (0, 0)
quant_dtype (QuantDtype, list or tuple) – Datatype to use for quantize weights and activations. The first element represent weights and second element represent data flow. Default: (QuantDtype.INT8, QuantDtype.INT8)
per_channel (bool, list or tuple) – Quantization granularity based on layer or on channel. If True then base on per channel otherwise base on per layer. The first element represent weights and second element represent data flow. Default: (False, False)
symmetric (bool, list or tuple) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric otherwise base on asymmetric. The first element represent weights and second element represent data flow. Default: (False, False)
narrow_range (bool, list or tuple) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False)

Returns

QuantConfig, Contains the observer type of weight and activation.

mindspore.compression.quant.load_nonquant_param_into_quant_net(quant_model, params_dict, quant_new_params=None)[source]

Load fp32 model parameters into quantization model.

Parameters

quant_model – quantization model.
params_dict – parameter dict that stores fp32 parameters.
quant_new_params – parameters that exist in quantitative network but not in unquantitative network.

Returns

None