mindspore.compression
mindspore.compression.quant
Compression quant module.
- class mindspore.compression.quant.OptimizeOption(value)[source]
An enum for the model quantization optimize option, currently only support QAT.
- class mindspore.compression.quant.QuantizationAwareTraining(bn_fold=True, freeze_bn=10000000, quant_delay=(0, 0), quant_dtype=(<QuantDtype.INT8: 'INT8'>, <QuantDtype.INT8: 'INT8'>), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False), optimize_option=OptimizeOption.QAT, one_conv_fold=True)[source]
Quantizer for quantization aware training.
- Parameters
bn_fold (bool) – Flag to used bn fold ops for simulation inference operation. Default: True.
freeze_bn (int) – Number of steps after which BatchNorm OP parameters used total mean and variance. Default: 1e7.
quant_delay (int, list or tuple) – Number of steps after which weights and activations are quantized during eval. The first element represent weights and second element represent data flow. Default: (0, 0)
quant_dtype (QuantDtype, list or tuple) – Datatype to use for quantize weights and activations. The first element represent weights and second element represent data flow. Default: (QuantDtype.INT8, QuantDtype.INT8)
per_channel (bool, list or tuple) – Quantization granularity based on layer or on channel. If True then base on per channel otherwise base on per layer. The first element represent weights and second element represent data flow. Default: (False, False)
symmetric (bool, list or tuple) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric otherwise base on asymmetric. The first element represent weights and second element represent data flow. Default: (False, False)
narrow_range (bool, list or tuple) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False)
optimize_option (OptimizeOption, list or tuple) – Specifies the quant algorithm and options, currently only support QAT. Default: OptimizeOption.QAT
one_conv_fold (bool) – Flag to used one conv bn fold ops for simulation inference operation. Default: True.
Examples
>>> class LeNet5(nn.Cell): ... def __init__(self, num_class=10, channel=1): ... super(LeNet5, self).__init__() ... self.type = "fusion" ... self.num_class = num_class ... ... # change `nn.Conv2d` to `nn.Conv2dBnAct` ... self.conv1 = nn.Conv2dBnAct(channel, 6, 5, pad_mode='valid', activation='relu') ... self.conv2 = nn.Conv2dBnAct(6, 16, 5, pad_mode='valid', activation='relu') ... # change `nn.Dense` to `nn.DenseBnAct` ... self.fc1 = nn.DenseBnAct(16 * 5 * 5, 120, activation='relu') ... self.fc2 = nn.DenseBnAct(120, 84, activation='relu') ... self.fc3 = nn.DenseBnAct(84, self.num_class) ... ... self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) ... self.flatten = nn.Flatten() ... ... def construct(self, x): ... x = self.conv1(x) ... x = self.max_pool2d(x) ... x = self.conv2(x) ... x = self.max_pool2d(x) ... x = self.flatten(x) ... x = self.fc1(x) ... x = self.fc2(x) ... x = self.fc3(x) ... return x ... >>> net = LeNet5() >>> quantizer = QuantizationAwareTraining(bn_fold=False, per_channel=[True, False], symmetric=[True, False]) >>> net_qat = quantizer.quantize(net)
- mindspore.compression.quant.create_quant_config(quant_observer=(<class 'mindspore.nn.layer.quant.FakeQuantWithMinMaxObserver'>, <class 'mindspore.nn.layer.quant.FakeQuantWithMinMaxObserver'>), quant_delay=(0, 0), quant_dtype=(<QuantDtype.INT8: 'INT8'>, <QuantDtype.INT8: 'INT8'>), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False))[source]
Configs the observer type of weights and data flow with quant params.
- Parameters
quant_observer (Observer, list or tuple) – The observer type to do quantization. The first element represent weights and second element represent data flow. Default: (nn.FakeQuantWithMinMaxObserver, nn.FakeQuantWithMinMaxObserver)
quant_delay (int, list or tuple) – Number of steps after which weights and activations are quantized during eval. The first element represent weights and second element represent data flow. Default: (0, 0)
quant_dtype (QuantDtype, list or tuple) – Datatype to use for quantize weights and activations. The first element represent weights and second element represent data flow. Default: (QuantDtype.INT8, QuantDtype.INT8)
per_channel (bool, list or tuple) – Quantization granularity based on layer or on channel. If True then base on per channel otherwise base on per layer. The first element represent weights and second element represent data flow. Default: (False, False)
symmetric (bool, list or tuple) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric otherwise base on asymmetric. The first element represent weights and second element represent data flow. Default: (False, False)
narrow_range (bool, list or tuple) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False)
- Returns
QuantConfig, Contains the observer type of weight and activation.
- mindspore.compression.quant.load_nonquant_param_into_quant_net(quant_model, params_dict, quant_new_params=None)[source]
Load fp32 model parameters into quantization model.
- Parameters
quant_model – quantization model.
params_dict – parameter dict that stores fp32 parameters.
quant_new_params – parameters that exist in quantitative network but not in unquantitative network.
- Returns
None
mindspore.compression.common
Compression common module.
- class mindspore.compression.common.QuantDtype(value)[source]
An enum for quant datatype, contains INT2`~`INT8, UINT2`~`UINT8.
- static is_signed(dtype)[source]
Get whether the quant datatype is signed.
- Parameters
dtype (QuantDtype) – quant datatype.
- Returns
bool, whether the input quant datatype is signed.
Examples
>>> quant_dtype = QuantDtype.INT8 >>> is_signed = QuantDtype.is_signed(quant_dtype)
- num_bits
Get the num bits of the QuantDtype member.
- Returns
int, the num bits of the QuantDtype member
Examples
>>> quant_dtype = QuantDtype.INT8 >>> num_bits = quant_dtype.num_bits
- static switch_signed(dtype)[source]
Switch the signed state of the input quant datatype.
- Parameters
dtype (QuantDtype) – quant datatype.
- Returns
QuantDtype, quant datatype with opposite signed state as the input.
Examples
>>> quant_dtype = QuantDtype.INT8 >>> quant_dtype = QuantDtype.switch_signed(quant_dtype)