mindspore.compression
mindspore.compression.quant
Quantization module, including base class of the quantizer, the quantization aware training algorithm, and quantization utils.
- class mindspore.compression.quant.OptimizeOption[source]
An enum for the model quantization optimize option, currently only support QAT and LEARNED_SCALE.
- class mindspore.compression.quant.QuantizationAwareTraining(bn_fold=True, freeze_bn=10000000, quant_delay=(0, 0), quant_dtype=(QuantDtype.INT8, QuantDtype.INT8), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False), optimize_option=OptimizeOption.QAT, one_conv_fold=True)[source]
Quantizer for quantization aware training.
- Parameters
bn_fold (bool) – Whether to use bn fold ops for simulation inference operation. Default: True.
freeze_bn (int) – Number of steps after which BatchNorm OP parameters fixed to global mean and variance. Default: 1e7.
quant_delay (Union[int, list, tuple]) – Number of steps after which weights and activations are quantized during train and eval. The first element represents weights and the second element represents data flow. Default: (0, 0).
quant_dtype (Union[QuantDtype, list, tuple]) – Datatype used to quantize weights and activations. The first element represents weights and the second element represents data flow. It is necessary to consider the precision support of hardware devices in the practical quantization infer scenario. Default: (QuantDtype.INT8, QuantDtype.INT8).
per_channel (Union[bool, list, tuple]) – Quantization granularity based on layer or on channel. If True then base on per channel, otherwise base on per layer. The first element represents weights and the second element represents data flow, and the second element must be False now. Default: (False, False).
symmetric (Union[bool, list, tuple]) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric, otherwise base on asymmetric. The first element represents weights and the second element represents data flow. Default: (False, False).
narrow_range (Union[bool, list, tuple]) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False).
optimize_option (Union[OptimizeOption, list, tuple]) – Specifies the quant algorithm and options, currently only support QAT and LEARNED_SCALE (Note that, if both QAT and LEARNED_SCALE are configured, LEARNED_SCALE has a higher priority. LEARNED_SCALE currently only work under some constraints, which includes: freeze_bn=0, quant_delay=0, symmetric=True, narrow_range=True, More specifically, for operators such as Relu and Relu6, which only have positive values, we add a negative truncation to optimize this scenario, and narrow_range will automatically match to False). Default: OptimizeOption.QAT.
one_conv_fold (bool) – Whether to use one conv bn fold ops for simulation inference operation. Default: True.
- Raises
TypeError – If the element of quant_delay or freeze_bn is not int.
TypeError – If bn_fold, one_conv_fold or the element of per_channel, symmetric, narrow_range is not bool.
TypeError – If the element of quant_dtype is not QuantDtype.
ValueError – If the length of quant_delay, quant_dtype, per_channel, symmetric or narrow_range is not less than 2.
ValueError – If the optimize_option is LEARNED_SCALE and freeze_bn is not equal to 0.
ValueError – If the optimize_option is LEARNED_SCALE and symmetric is not (True, True).
ValueError – If the optimize_option is LEARNED_SCALE and narrow_range is not (True, True).
ValueError – If the optimize_option is LEARNED_SCALE and quant_delay is not (0, 0).
Examples
>>> from mindspore.compression.quant import QuantizationAwareTraining >>> class LeNet5(nn.Cell): ... def __init__(self, num_class=10, channel=1): ... super(LeNet5, self).__init__() ... self.type = "fusion" ... self.num_class = num_class ... ... # change `nn.Conv2d` to `nn.Conv2dBnAct` ... self.conv1 = nn.Conv2dBnAct(channel, 6, 5, pad_mode='valid', activation='relu') ... self.conv2 = nn.Conv2dBnAct(6, 16, 5, pad_mode='valid', activation='relu') ... # change `nn.Dense` to `nn.DenseBnAct` ... self.fc1 = nn.DenseBnAct(16 * 5 * 5, 120, activation='relu') ... self.fc2 = nn.DenseBnAct(120, 84, activation='relu') ... self.fc3 = nn.DenseBnAct(84, self.num_class) ... ... self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) ... self.flatten = nn.Flatten() ... ... def construct(self, x): ... x = self.conv1(x) ... x = self.max_pool2d(x) ... x = self.conv2(x) ... x = self.max_pool2d(x) ... x = self.flatten(x) ... x = self.fc1(x) ... x = self.fc2(x) ... x = self.fc3(x) ... return x ... >>> net = LeNet5() >>> quantizer = QuantizationAwareTraining(bn_fold=False, per_channel=[True, False], symmetric=[True, False]) >>> net_qat = quantizer.quantize(net)
- mindspore.compression.quant.create_quant_config(quant_observer=(nn.FakeQuantWithMinMaxObserver, nn.FakeQuantWithMinMaxObserver), quant_delay=(0, 0), quant_dtype=(QuantDtype.INT8, QuantDtype.INT8), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False), mode='DEFAULT')[source]
Config the observer type of weights and data flow with quant parameters.
- Parameters
quant_observer (Union[Observer, list, tuple]) – The types of observer for quantization. The first element applies to weights and the second applies to data flow. Currently, only
FakeQuantWithMinMaxObserver
supported. Default: (nn.FakeQuantWithMinMaxObserver, nn.FakeQuantWithMinMaxObserver).quant_delay (Union[int, list, tuple]) – Number of steps after which weights and activations are quantized during train and eval. The first element represents weights and the second element represents data flow. Default: (0, 0).
quant_dtype (Union[QuantDtype, list, tuple]) – Datatype used to quantize weights and activations. The first element represents weights and the second element represents data flow. Default: (QuantDtype.INT8, QuantDtype.INT8).
per_channel (Union[bool, list, tuple]) – Quantization granularity based on layer or on channel. If True then base on per channel, otherwise base on per layer. The first element represents weights and the second element represents data flow, and the second element must be False now. Default: (False, False).
symmetric (Union[bool, list, tuple]) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric, otherwise base on asymmetric. The first element represents weights and the second element represents data flow. Default: (False, False).
narrow_range (Union[bool, list, tuple]) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False).
mode (str) – Optional quantization mode, currently only DEFAULT`(QAT) and `LEARNED_SCALE are supported. Default: “DEFAULT”.
- Returns
QuantConfig, contains the observer type of weight and activation.
- Raises
ValueError – If the second element of per_channel is not False.
- mindspore.compression.quant.load_nonquant_param_into_quant_net(quant_model, params_dict, quant_new_params=None)[source]
Load fp32 model parameters into quantization model.
- Parameters
- Raises
TypeError – If quant_new_params is not None and is not list.
ValueError – If there are parameters in the quant_model that are neither in params_dict nor in quant_new_params.
Examples
>>> from mindspore import load_checkpoint >>> from mindspore.compression.quant.quant_utils import load_nonquant_param_into_quant_net >>> class LeNet5(nn.Cell): ... def __init__(self, num_class=10, channel=1): ... super(LeNet5, self).__init__() ... self.type = "fusion" ... self.num_class = num_class ... ... # change `nn.Conv2d` to `nn.Conv2dBnAct` ... self.conv1 = nn.Conv2dBnAct(channel, 6, 5, pad_mode='valid', activation='relu') ... self.conv2 = nn.Conv2dBnAct(6, 16, 5, pad_mode='valid', activation='relu') ... # change `nn.Dense` to `nn.DenseBnAct` ... self.fc1 = nn.DenseBnAct(16 * 5 * 5, 120, activation='relu') ... self.fc2 = nn.DenseBnAct(120, 84, activation='relu') ... self.fc3 = nn.DenseBnAct(84, self.num_class) ... ... self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) ... self.flatten = nn.Flatten() ... ... def construct(self, x): ... x = self.conv1(x) ... x = self.max_pool2d(x) ... x = self.conv2(x) ... x = self.max_pool2d(x) ... x = self.flatten(x) ... x = self.fc1(x) ... x = self.fc2(x) ... x = self.fc3(x) ... return x ... >>> net = LeNet5() >>> ckpt_file_name = "./checkpoint/LeNet5_noquant-1_32.ckpt" >>> param_dict = load_checkpoint(ckpt_file_name) >>> load_nonquant_param_into_quant_net(net, param_dict)
- mindspore.compression.quant.query_quant_layers(network)[source]
Query the network’s quantization strategy of each quantized layer and print it to the screen, note that all the quantization layers are queried before graph compile optimization in the graph mode, thus, some redundant quantized layers, which not exist in practical execution, may appear.
- Parameters
network (Cell) – input network
Examples
>>> from mindspore.compression.quant import QuantizationAwareTraining >>> from mindspore.compression.quant.quant_utils import query_quant_layers >>> class LeNet5(nn.Cell): ... def __init__(self, num_class=10, channel=1): ... super(LeNet5, self).__init__() ... self.type = "fusion" ... self.num_class = num_class ... ... # change `nn.Conv2d` to `nn.Conv2dBnAct` ... self.conv1 = nn.Conv2dBnAct(channel, 6, 5, pad_mode='valid', activation='relu') ... self.conv2 = nn.Conv2dBnAct(6, 16, 5, pad_mode='valid', activation='relu') ... # change `nn.Dense` to `nn.DenseBnAct` ... self.fc1 = nn.DenseBnAct(16 * 5 * 5, 120, activation='relu') ... self.fc2 = nn.DenseBnAct(120, 84, activation='relu') ... self.fc3 = nn.DenseBnAct(84, self.num_class) ... ... self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) ... self.flatten = nn.Flatten() ... ... def construct(self, x): ... x = self.conv1(x) ... x = self.max_pool2d(x) ... x = self.conv2(x) ... x = self.max_pool2d(x) ... x = self.flatten(x) ... x = self.fc1(x) ... x = self.fc2(x) ... x = self.fc3(x) ... return x ... >>> net = LeNet5() >>> quantizer = QuantizationAwareTraining(bn_fold=False, per_channel=[True, False], symmetric=[True, False]) >>> net_qat = quantizer.quantize(net) >>> query_quant_layers(net_qat) conv1.conv.fake_quant_weight INT8 conv1.activation.fake_quant_act INT8 conv2.conv.fake_quant_weight INT8 conv2.activation.fake_quant_act INT8 fc1.dense.fake_quant_weight INT8 fc1.activation.fake_quant_act INT8 fc2.dense.fake_quant_weight INT8 fc2.activation.fake_quant_act INT8 fc3.dense.fake_quant_weight INT8 fc3.activation.fake_quant_act INT8
mindspore.compression.common
Common module for various compression algorithms, now only including datatype definition for quantization.
- class mindspore.compression.common.QuantDtype[source]
An enum for quant datatype, contains INT2 ~ INT8, UINT2 ~ UINT8.
- static is_signed(dtype)[source]
Get whether the quant datatype is signed.
- Parameters
dtype (QuantDtype) – quant datatype.
- Returns
bool, whether the input quant datatype is signed.
Examples
>>> quant_dtype = QuantDtype.INT8 >>> is_signed = QuantDtype.is_signed(quant_dtype)
- num_bits
Get the num bits of the QuantDtype member.
- Returns
int, the num bits of the QuantDtype member.
Examples
>>> from mindspore.compression.common import QuantDtype >>> quant_dtype = QuantDtype.INT8 >>> num_bits = quant_dtype.num_bits >>> print(num_bits) 8
- static switch_signed(dtype)[source]
Switch the signed state of the input quant datatype.
- Parameters
dtype (QuantDtype) – quant datatype.
- Returns
QuantDtype, quant datatype with opposite signed state as the input.
Examples
>>> quant_dtype = QuantDtype.INT8 >>> quant_dtype = QuantDtype.switch_signed(quant_dtype)