mindspore.nn.FakeQuantWithMinMaxObserver

class mindspore.nn.FakeQuantWithMinMaxObserver(min_init=- 6, max_init=6, ema=False, ema_decay=0.999, per_channel=False, channel_axis=1, num_channels=1, quant_dtype=QuantDtype.INT8, symmetric=False, narrow_range=False, quant_delay=0, neg_trunc=False, mode='DEFAULT')[source]

Quantization aware operation which provides the fake quantization observer function on data with min and max.

The detail of the quantization mode DEFAULT is described as below:

The running min/max $x_{m i n}$ and $x_{m a x}$ are computed as:

\begin{array}{r} \begin{array}{ll} x_{m i n} = {\begin{cases} min (min (X), 0) & if e m a = False \\ min ((1 - c) min (X) + c x_{m i n}, 0) & if otherwise \end{cases} \\ x_{m a x} = {\begin{cases} max (max (X), 0) & if e m a = False \\ max ((1 - c) max (X) + c x_{m a x}, 0) & if otherwise \end{cases} \end{array} \end{array}

where X is the input tensor, and $c$ is the ema_decay.

The scale and zero point zp is computed as:

\begin{array}{r} \begin{array}{ll} s c a l e = {\begin{cases} \frac{x_{m a x} - x_{m i n}}{Q_{m a x} - Q_{m i n}} & if s y m m e t r i c = False \\ \frac{2 max (x_{m a x}, | x_{m i n} |)}{Q_{m a x} - Q_{m i n}} & if otherwise \end{cases} \\ z p_m i n = Q_{m i n} - \frac{x_{m i n}}{s c a l e} \\ z p = ⌊ min (Q_{m a x}, max (Q_{m i n}, z p_m i n)) + 0.5 ⌋ \end{array} \end{array}

where $Q_{m a x}$ and $Q_{m i n}$ is decided by quant_dtype, for example, if quant_dtype=INT8, then $Q_{m a x} = 127$ and $Q_{m i n} = - 128$ .

The fake quant output is computed as:

\begin{array}{r} \begin{array}{ll} u_{m i n} = (Q_{m i n} - z p) * s c a l e \\ u_{m a x} = (Q_{m a x} - z p) * s c a l e \\ u_{X} = ⌊ \frac{min (u_{m a x}, max (u_{m i n}, X)) - u_{m i n}}{s c a l e} + 0.5 ⌋ \\ o u t p u t = u_{X} * s c a l e + u_{m i n} \end{array} \end{array}

The detail of the quantization mode LEARNED_SCALE is described as below:

The fake quant output is computed as:

\begin{array}{r} \begin{matrix} \begin{matrix} \bar{X} = {\begin{array}{c} c l i p (\frac{X}{m a x q}, 0, 1) i f n e g_t r u n c \\ c l i p (\frac{X}{m a x q}, - 1, 1) i f o t h e r w i s e \end{array} \end{matrix} \\ o u t p u t = \frac{f l o o r (\bar{X} * Q_{m a x} + 0.5) * s c a l e}{Q_{m a x}} \end{matrix} \end{array}

where X is the input tensor. where $Q_{m a x}$ (quant_max) is decided by quant_dtype and neg_trunc, for example, if quant_dtype=INT8 and neg_trunc works, $Q_{m a x} = 256$ , otherwise math:Q_{max} = 127.

The maxq is updated by training, and its gradient is calculated as follows:

\begin{array}{r} \begin{matrix} \begin{matrix} \frac{\partial o u t p u t}{\partial m a x q} = {\begin{array}{c} - \frac{X}{m a x q} + ⌊ \frac{X}{m a x q} ⌉ i f b o u n d_{l o w e r} < \frac{X}{m a x q} < 1 \\ - 1 i f \frac{X}{m a x q} \leq b o u n d_{l o w e r} \\ 1 i f \frac{X}{m a x q} \geq 1 \end{array} \end{matrix} \\ \begin{matrix} b o u n d_{l o w e r} = {\begin{array}{c} 0 i f n e g_t r u n c \\ - 1 i f o t h e r w i s e \end{array} \end{matrix} \end{matrix} \end{array}

Then minq is computed as:

\begin{array}{r} m i n q = {\begin{array}{c} 0 i f n e g_t r u n c \\ - m a x q i f o t h e r w i s e \end{array} \end{array}

When exporting, the scale and zero point zp is computed as:

\begin{array}{r} s c a l e = \frac{m a x q}{q u a n t_m a x}, z p = 0 \end{array}

zp is equal to 0 consistently, due to the LEARNED_SCALE`s symmetric nature.

Parameters

min_init (int, float, list) – The initialized min value. Default: -6.
max_init (int, float, list) – The initialized max value. Default: 6.
ema (bool) – The exponential Moving Average algorithm updates min and max. Default: False.
ema_decay (float) – Exponential Moving Average algorithm parameter. Default: 0.999.
per_channel (bool) – Quantization granularity based on layer or on channel. Default: False.
channel_axis (int) – Quantization by channel axis. Default: 1.
num_channels (int) – declarate the min and max channel size, Default: 1.
quant_dtype (QuantDtype) – The datatype of quantization, supporting 4 and 8bits. Default: QuantDtype.INT8.
symmetric (bool) – Whether the quantization algorithm is symmetric or not. Default: False.
narrow_range (bool) – Whether the quantization algorithm uses narrow range or not. Default: False.
quant_delay (int) – Quantization delay parameters according to the global step. Default: 0.
neg_trunc (bool) – Whether the quantization algorithm uses negative truncation or not. Default: False.
mode (str) – Optional quantization mode, currently only DEFAULT`(QAT) and `LEARNED_SCALE are supported. Default: (“DEFAULT”)

Inputs:

x (Tensor) - The input of FakeQuantWithMinMaxObserver. The input dimension is preferably 2D or 4D.

Outputs:

Tensor, with the same type and shape as the x.

Raises

TypeError – If min_init or max_init is not int, float or list.
TypeError – If quant_delay is not an int.
ValueError – If quant_delay is less than 0.
ValueError – If min_init is not less than max_init.
ValueError – If mode is neither DEFAULT nor LEARNED_SCALE.
ValueError – If mode is LEARNED_SCALE and symmetric is not True.
ValueError – If mode is LEARNED_SCALE, and narrow_range is not True unless when neg_trunc is True.

Supported Platforms:: Ascend GPU

Examples

>>> import mindspore
>>> from mindspore import Tensor
>>> fake_quant = nn.FakeQuantWithMinMaxObserver()
>>> x = Tensor(np.array([[1, 2, 1], [-2, 0, -1]]), mindspore.float32)
>>> result = fake_quant(x)
>>> print(result)
[[ 0.9882355  1.9764705  0.9882355]
 [-1.9764705  0.        -0.9882355]]

extend_repr()[source]: Display instance object as string.

reset(quant_dtype=QuantDtype.INT8, min_init=-6, max_init=6)[source]: Reset the quant max parameter (eg. 256) and the initial value of the minq parameter and maxq parameter, this function is currently only valid for LEARNED_SCALE mode.