mindspore.compression¶
mindspore.compression.quant¶
Compression quant module.
-
class
mindspore.compression.quant.
QuantizationAwareTraining
(bn_fold=True, freeze_bn=10000000, quant_delay=(0, 0), quant_dtype=(<QuantDtype.INT8: 'INT8'>, <QuantDtype.INT8: 'INT8'>), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False), optimize_option=<OptimizeOption.QAT: 'QAT'>, one_conv_fold=True)[source]¶ Quantizer for quantization aware training.
- Parameters
bn_fold (bool) – Flag to used bn fold ops for simulation inference operation. Default: True.
freeze_bn (int) – Number of steps after which BatchNorm OP parameters used total mean and variance. Default: 1e7.
quant_delay (int, list or tuple) – Number of steps after which weights and activations are quantized during eval. The first element represent weights and second element represent data flow. Default: (0, 0)
quant_dtype (QuantDtype, list or tuple) – Datatype to use for quantize weights and activations. The first element represent weights and second element represent data flow. Default: (QuantDtype.INT8, QuantDtype.INT8)
per_channel (bool, list or tuple) – Quantization granularity based on layer or on channel. If True then base on per channel otherwise base on per layer. The first element represent weights and second element represent data flow. Default: (False, False)
symmetric (bool, list or tuple) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric otherwise base on asymmetric. The first element represent weights and second element represent data flow. Default: (False, False)
narrow_range (bool, list or tuple) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False)
optimize_option (OptimizeOption, list or tuple) – Specifies the quant algorithm and options, currently only support QAT. Default: OptimizeOption.QAT
one_conv_fold (bool) – Flag to used one conv bn fold ops for simulation inference operation. Default: True.
Examples
>>> class LeNet5(nn.Cell): ... def __init__(self, num_class=10, channel=1): ... super(LeNet5, self).__init__() ... self.type = "fusion" ... self.num_class = num_class ... ... # change `nn.Conv2d` to `nn.Conv2dBnAct` ... self.conv1 = nn.Conv2dBnAct(channel, 6, 5, pad_mode='valid', activation='relu') ... self.conv2 = nn.Conv2dBnAct(6, 16, 5, pad_mode='valid', activation='relu') ... # change `nn.Dense` to `nn.DenseBnAct` ... self.fc1 = nn.DenseBnAct(16 * 5 * 5, 120, activation='relu') ... self.fc2 = nn.DenseBnAct(120, 84, activation='relu') ... self.fc3 = nn.DenseBnAct(84, self.num_class) ... ... self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) ... self.flatten = nn.Flatten() ... ... def construct(self, x): ... x = self.conv1(x) ... x = self.max_pool2d(x) ... x = self.conv2(x) ... x = self.max_pool2d(x) ... x = self.flatten(x) ... x = self.fc1(x) ... x = self.fc2(x) ... x = self.fc3(x) ... return x ... >>> net = LeNet5() >>> quantizer = QuantizationAwareTraining(bn_fold=False, per_channel=[True, False], symmetric=[True, False]) >>> net_qat = quantizer.quantize(net)
-
mindspore.compression.quant.
create_quant_config
(quant_observer=(<class 'mindspore.nn.layer.quant.FakeQuantWithMinMaxObserver'>, <class 'mindspore.nn.layer.quant.FakeQuantWithMinMaxObserver'>), quant_delay=(0, 0), quant_dtype=(<QuantDtype.INT8: 'INT8'>, <QuantDtype.INT8: 'INT8'>), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False))[source]¶ Configs the observer type of weights and data flow with quant params.
- Parameters
quant_observer (Observer, list or tuple) – The observer type to do quantization. The first element represent weights and second element represent data flow. Default: (nn.FakeQuantWithMinMaxObserver, nn.FakeQuantWithMinMaxObserver)
quant_delay (int, list or tuple) – Number of steps after which weights and activations are quantized during eval. The first element represent weights and second element represent data flow. Default: (0, 0)
quant_dtype (QuantDtype, list or tuple) – Datatype to use for quantize weights and activations. The first element represent weights and second element represent data flow. Default: (QuantDtype.INT8, QuantDtype.INT8)
per_channel (bool, list or tuple) – Quantization granularity based on layer or on channel. If True then base on per channel otherwise base on per layer. The first element represent weights and second element represent data flow. Default: (False, False)
symmetric (bool, list or tuple) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric otherwise base on asymmetric. The first element represent weights and second element represent data flow. Default: (False, False)
narrow_range (bool, list or tuple) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False)
- Returns
QuantConfig, Contains the observer type of weight and activation.
-
class
mindspore.compression.quant.
OptimizeOption
[source]¶ An enum for the model quantization optimize option, currently only support QAT.
-
mindspore.compression.quant.
load_nonquant_param_into_quant_net
(quant_model, params_dict, quant_new_params=None)[source]¶ Load fp32 model parameters into quantization model.
- Parameters
quant_model – quantization model.
params_dict – parameter dict that stores fp32 parameters.
quant_new_params – parameters that exist in quantitative network but not in unquantitative network.
- Returns
None
mindspore.compression.common¶
Compression common module.
-
class
mindspore.compression.common.
QuantDtype
[source]¶ An enum for quant datatype, contains INT2`~`INT8, UINT2`~`UINT8.
-
num_bits
¶ Get the num bits of the QuantDtype member.
- Returns
int, the num bits of the QuantDtype member
Examples
>>> quant_dtype = QuantDtype.INT8 >>> num_bits = quant_dtype.num_bits
-