mindspore.nn.FakeQuantWithMinMaxObserver
- class mindspore.nn.FakeQuantWithMinMaxObserver(min_init=- 6, max_init=6, ema=False, ema_decay=0.999, per_channel=False, channel_axis=1, num_channels=1, quant_dtype=QuantDtype.INT8, symmetric=False, narrow_range=False, quant_delay=0, neg_trunc=False, mode='DEFAULT')[source]
Quantization aware operation which provides the fake quantization observer function on data with min and max.
The detail of the quantization mode DEFAULT is described as below:
The running min/max
and are computed as:where X is the input tensor, and
is the ema_decay.The scale and zero point zp is computed as:
where
and is decided by quant_dtype, for example, if quant_dtype=INT8, then and .The fake quant output is computed as:
The detail of the quantization mode LEARNED_SCALE is described as below:
The fake quant output is computed as:
where X is the input tensor. where
(quant_max) is decided by quant_dtype and neg_trunc, for example, if quant_dtype=INT8 and neg_trunc works, , otherwise math:Q_{max} = 127.The maxq is updated by training, and its gradient is calculated as follows:
Then minq is computed as:
When exporting, the scale and zero point zp is computed as:
zp is equal to 0 consistently, due to the LEARNED_SCALE`s symmetric nature.
- Parameters
min_init (int, float, list) – The initialized min value. Default: -6.
max_init (int, float, list) – The initialized max value. Default: 6.
ema (bool) – The exponential Moving Average algorithm updates min and max. Default: False.
ema_decay (float) – Exponential Moving Average algorithm parameter. Default: 0.999.
per_channel (bool) – Quantization granularity based on layer or on channel. Default: False.
channel_axis (int) – Quantization by channel axis. Default: 1.
num_channels (int) – declarate the min and max channel size, Default: 1.
quant_dtype (QuantDtype) – The datatype of quantization, supporting 4 and 8bits. Default: QuantDtype.INT8.
symmetric (bool) – Whether the quantization algorithm is symmetric or not. Default: False.
narrow_range (bool) – Whether the quantization algorithm uses narrow range or not. Default: False.
quant_delay (int) – Quantization delay parameters according to the global step. Default: 0.
neg_trunc (bool) – Whether the quantization algorithm uses negative truncation or not. Default: False.
mode (str) – Optional quantization mode, currently only DEFAULT`(QAT) and `LEARNED_SCALE are supported. Default: (“DEFAULT”)
- Inputs:
x (Tensor) - The input of FakeQuantWithMinMaxObserver. The input dimension is preferably 2D or 4D.
- Outputs:
Tensor, with the same type and shape as the x.
- Raises
TypeError – If min_init or max_init is not int, float or list.
TypeError – If quant_delay is not an int.
ValueError – If quant_delay is less than 0.
ValueError – If min_init is not less than max_init.
ValueError – If mode is neither DEFAULT nor LEARNED_SCALE.
ValueError – If mode is LEARNED_SCALE and symmetric is not True.
ValueError – If mode is LEARNED_SCALE, and narrow_range is not True unless when neg_trunc is True.
- Supported Platforms:
Ascend
GPU
Examples
>>> import mindspore >>> from mindspore import Tensor >>> fake_quant = nn.FakeQuantWithMinMaxObserver() >>> x = Tensor(np.array([[1, 2, 1], [-2, 0, -1]]), mindspore.float32) >>> result = fake_quant(x) >>> print(result) [[ 0.9882355 1.9764705 0.9882355] [-1.9764705 0. -0.9882355]]