mindspore.nn.FakeQuantWithMinMaxObserver

class mindspore.nn.FakeQuantWithMinMaxObserver(min_init=- 6, max_init=6, ema=False, ema_decay=0.999, per_channel=False, channel_axis=1, num_channels=1, quant_dtype=QuantDtype.INT8, symmetric=False, narrow_range=False, quant_delay=0)[source]

Quantization aware operation which provides the fake quantization observer function on data with min and max.

The running min/max xmin and xmax are computed as:

xmin={min(min(X),0) if ema=Falsemin((1c)min(X)+xmin,0) if otherwisexmax={max(max(X),0) if ema=Falsemax((1c)max(X)+xmax,0) if otherwise

where X is the input tensor, and c is the ema_decay.

The scale and zero point zp is computed as:

scale={xmaxxminQmaxQmin if symmetric=False2max(xmax,|xmin|)QmaxQmin if otherwisezp_min=Qminxminscalezp=min(Qmax,max(Qmin,zp_min))+0.5

where Qmax and Qmin is decided by quant_dtype, for example, if quant_dtype=INT8, then Qmax=127 and Qmin=128.

The fake quant output is computed as:

umin=(Qminzp)scaleumax=(Qmaxzp)scaleuX=min(umax,max(umin,X))uminscale+0.5output=uXscale+umin
Parameters
  • min_init (int, float) – The initialized min value. Default: -6.

  • max_init (int, float) – The initialized max value. Default: 6.

  • ema (bool) – The exponential Moving Average algorithm updates min and max. Default: False.

  • ema_decay (float) – Exponential Moving Average algorithm parameter. Default: 0.999.

  • per_channel (bool) – Quantization granularity based on layer or on channel. Default: False.

  • channel_axis (int) – Quantization by channel axis. Default: 1.

  • num_channels (int) – declarate the min and max channel size, Default: 1.

  • quant_dtype (QuantDtype) – The datatype of quantization, supporting 4 and 8bits. Default: QuantDtype.INT8.

  • symmetric (bool) – Whether the quantization algorithm is symmetric or not. Default: False.

  • narrow_range (bool) – Whether the quantization algorithm uses narrow range or not. Default: False.

  • quant_delay (int) – Quantization delay parameters according to the global step. Default: 0.

Inputs:
  • input (Tensor) - The input of FakeQuantWithMinMaxObserver.

Outputs:

Tensor, with the same type and shape as the input.

Raises
  • TypeError – If min_init or max_init is neither int nor float.

  • TypeError – If quant_delay is not an int.

  • TypeError – If min_init is not less than max_init.

  • TypeError – If quant_delay is not greater than or equal to 0.

Supported Platforms:

Ascend GPU

Examples

>>> fake_quant = nn.FakeQuantWithMinMaxObserver()
>>> input = Tensor(np.array([[1, 2, 1], [-2, 0, -1]]), mindspore.float32)
>>> output = fake_quant(input)
>>> print(output)
[[ 0.9882355  1.9764705  0.9882355]
 [-1.9764705  0.        -0.9882355]]