mindspore.compression

mindspore.compression.quant

Compression quant module.

class mindspore.compression.quant.QuantizationAwareTraining(bn_fold=True, freeze_bn=10000000, quant_delay=(0, 0), quant_dtype=(<QuantDtype.INT8: 'INT8'>, <QuantDtype.INT8: 'INT8'>), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False), optimize_option=<OptimizeOption.QAT: 'QAT'>, one_conv_fold=True)[source]

Quantizer for quantization aware training.

Parameters
  • bn_fold (bool) – Flag to used bn fold ops for simulation inference operation. Default: True.

  • freeze_bn (int) – Number of steps after which BatchNorm OP parameters used total mean and variance. Default: 1e7.

  • quant_delay (int, list or tuple) – Number of steps after which weights and activations are quantized during eval. The first element represent weights and second element represent data flow. Default: (0, 0)

  • quant_dtype (QuantDtype, list or tuple) – Datatype to use for quantize weights and activations. The first element represent weights and second element represent data flow. Default: (QuantDtype.INT8, QuantDtype.INT8)

  • per_channel (bool, list or tuple) – Quantization granularity based on layer or on channel. If True then base on per channel otherwise base on per layer. The first element represent weights and second element represent data flow. Default: (False, False)

  • symmetric (bool, list or tuple) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric otherwise base on asymmetric. The first element represent weights and second element represent data flow. Default: (False, False)

  • narrow_range (bool, list or tuple) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False)

  • optimize_option (OptimizeOption, list or tuple) – Specifies the quant algorithm and options, currently only support QAT. Default: OptimizeOption.QAT

  • one_conv_fold (bool) – Flag to used one conv bn fold ops for simulation inference operation. Default: True.

Examples

>>> class LeNet5(nn.Cell):
...     def __init__(self, num_class=10, channel=1):
...         super(LeNet5, self).__init__()
...         self.type = "fusion"
...         self.num_class = num_class
...
...         # change `nn.Conv2d` to `nn.Conv2dBnAct`
...         self.conv1 = nn.Conv2dBnAct(channel, 6, 5, pad_mode='valid', activation='relu')
...         self.conv2 = nn.Conv2dBnAct(6, 16, 5, pad_mode='valid', activation='relu')
...         # change `nn.Dense` to `nn.DenseBnAct`
...         self.fc1 = nn.DenseBnAct(16 * 5 * 5, 120, activation='relu')
...         self.fc2 = nn.DenseBnAct(120, 84, activation='relu')
...         self.fc3 = nn.DenseBnAct(84, self.num_class)
...
...         self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
...         self.flatten = nn.Flatten()
...
...     def construct(self, x):
...         x = self.conv1(x)
...         x = self.max_pool2d(x)
...         x = self.conv2(x)
...         x = self.max_pool2d(x)
...         x = self.flatten(x)
...         x = self.fc1(x)
...         x = self.fc2(x)
...         x = self.fc3(x)
...         return x
...
>>> net = LeNet5()
>>> quantizer = QuantizationAwareTraining(bn_fold=False, per_channel=[True, False], symmetric=[True, False])
>>> net_qat = quantizer.quantize(net)
quantize(network)[source]

Quant API to convert input network to a quantization aware training network

Parameters

network (Cell) – network to be quantized.

Examples

>>> net = Net()
>>> quantizer = QuantizationAwareTraining()
>>> net_qat = quantizer.quantize(net)
mindspore.compression.quant.create_quant_config(quant_observer=(<class 'mindspore.nn.layer.quant.FakeQuantWithMinMaxObserver'>, <class 'mindspore.nn.layer.quant.FakeQuantWithMinMaxObserver'>), quant_delay=(0, 0), quant_dtype=(<QuantDtype.INT8: 'INT8'>, <QuantDtype.INT8: 'INT8'>), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False))[source]

Configs the observer type of weights and data flow with quant params.

Parameters
  • quant_observer (Observer, list or tuple) – The observer type to do quantization. The first element represent weights and second element represent data flow. Default: (nn.FakeQuantWithMinMaxObserver, nn.FakeQuantWithMinMaxObserver)

  • quant_delay (int, list or tuple) – Number of steps after which weights and activations are quantized during eval. The first element represent weights and second element represent data flow. Default: (0, 0)

  • quant_dtype (QuantDtype, list or tuple) – Datatype to use for quantize weights and activations. The first element represent weights and second element represent data flow. Default: (QuantDtype.INT8, QuantDtype.INT8)

  • per_channel (bool, list or tuple) – Quantization granularity based on layer or on channel. If True then base on per channel otherwise base on per layer. The first element represent weights and second element represent data flow. Default: (False, False)

  • symmetric (bool, list or tuple) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric otherwise base on asymmetric. The first element represent weights and second element represent data flow. Default: (False, False)

  • narrow_range (bool, list or tuple) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False)

Returns

QuantConfig, Contains the observer type of weight and activation.

class mindspore.compression.quant.OptimizeOption[source]

An enum for the model quantization optimize option, currently only support QAT.

mindspore.compression.quant.load_nonquant_param_into_quant_net(quant_model, params_dict, quant_new_params=None)[source]

Load fp32 model parameters into quantization model.

Parameters
  • quant_model – quantization model.

  • params_dict – parameter dict that stores fp32 parameters.

  • quant_new_params – parameters that exist in quantitative network but not in unquantitative network.

Returns

None

mindspore.compression.common

Compression common module.

class mindspore.compression.common.QuantDtype[source]

An enum for quant datatype, contains INT2`~`INT8, UINT2`~`UINT8.

num_bits

Get the num bits of the QuantDtype member.

Returns

int, the num bits of the QuantDtype member

Examples

>>> quant_dtype = QuantDtype.INT8
>>> num_bits = quant_dtype.num_bits