mindspore_gs.ptq.PTQConfig

View Source On Gitee
class mindspore_gs.ptq.PTQConfig(mode = PTQMode.QUANTIZE, backend = BackendTarget.ASCEND, opname_blacklist = <class 'list'>, algo_args = <class 'dict'>, weight_quant_dtype = Int8, kvcache_quant_dtype = None, act_quant_dtype = None, precision_recovery = PrecisionRecovery.NONE, outliers_suppression = OutliersSuppressionType.NONE, weight_quant_granularity = QuantGranularity.PER_CHANNEL, kvcache_quant_granularity = QuantGranularity.PER_CHANNEL, act_quant_granularity = QuantGranularity.PER_TENSOR, group_size = 0)[source]

Config for post trainning quantization.

Parameters
  • mode (mindspore_gs.ptq.PTQMode) – Flag for ptq mode, QUANTIZATION for quantization mode, DEPLOY for deploy mode.

  • backend (mindspore_gs.common.BackendTarget) – Flag for backend target, NONE for no specific backend, ASCEND for ascend backend.

  • opname_blacklist (List[str]) – Blacklist of opname. Layers in network with name fuzzy matched with this blacklist will not being quanted.

  • algo_args (Union[dict, dataclass]) – Used to configure hyperparameters of algorithms such as RTN, SmoothQuant, and OmniQuant.

  • act_quant_dtype (mindspore.dtype) – Used to configure the quantization type of activation. mindspore.dtype.int8 indicates that the activation is quantized by 8 bits, and None indicates that it is not quantized.

  • weight_quant_dtype (mindspore.dtype) – Used to configure the quantization type of weight. mindspore.dtype.int8 indicates that the weight is quantized by 8 bits, and None indicates that it is not quantized.

  • kvcache_quant_dtype (mindspore.dtype) – Used to configure the quantization type of kvcache. mindspore.dtype.int8 indicates that the kvcache is quantized by 8 bits, and None indicates that it is not quantized.

  • outliers_suppression (mindspore_gs.ptq.OutliersSuppressionType) – Used to configure outliers suppression method before quantization. OutliersSuppressionType.SMOOTH indicates using smooth method from SmoothQuant to suppress outliers, and OutliersSuppressionType.NONE as default indicates doing nothing for outliers.

  • precision_recovery (mindspore_gs.ptq.PrecisionRecovery) – Used to precision compensation of weights during quantization. PrecisionRecovery.GPTQ indicates using GPTQ method to compensate precision, and PrecisionRecovery.NONE as default indicates doing nothing for precision recovery.

  • act_quant_granularity (mindspore_gs.ptq.QuantGranularity) – Used to configure the quantization granularity of activation. Currently only QuantGranularity.PER_TENSOR and QuantGranularity.PER_TOKEN are supported.

  • kvcache_quant_granularity (mindspore_gs.ptq.QuantGranularity) – Used to configure the quantization granularity of kvcache. Currently only QuantGranularity.PER_CHANNEL and QuantGranularity.PER_TOKEN are supported.

  • weight_quant_granularity (mindspore_gs.ptq.QuantGranularity) – Used to configure the quantization granularity of weight. Currently only QuantGranularity.PER_CHANNEL and QuantGranularity.PER_GROUP are supported.

  • group_size (int, optional) – group_size of per_group quantization, suggest using 64 or 128. Default value: 0.

Raises
  • ValueError – If mode is not PTQMode.QUANTIZE or PTQMode.DEPLOY.

  • ValueError – If backend is not BackendTarget.NONE or BackendTarget.ASCEND.

  • TypeError – If opname_blacklist is not a list of str.

  • ValueError – If weight_quant_dtype is not mindspore.dtype.int8 or None.

  • ValueError – If kvcache_quant_dtype is not mindspore.dtype.int8 or None.

  • ValueError – If act_quant_dtype is not mindspore.dtype.int8 or None.

  • TypeError – If outliers_suppression is not a OutliersSuppressionType.

  • TypeError – If precision_recovery is not a PrecisionRecovery.

  • ValueError – If act_quant_granularity is not QuantGranularity.PER_TENSOR or QuantGranularity.PER_TOKEN.

  • ValueError – If kvcache_quant_granularity is not QuantGranularity.PER_CHANNEL or QuantGranularity.PER_TOKEN.

  • ValueError – If act_quant_granularity is QuantGranularity.PER_TOKEN but weight_quant_dtype != msdtype.int8 or act_quant_dtype != msdtype.int8.

  • ValueError – If kvcache_quant_granularity is QuantGranularity.PER_TOKEN but kvcache_quant_dtype != msdtype.int8.

  • ValueError – If weight_quant_granularity is not QuantGranularity.PER_CHANNEL or QuantGranularity.PER_GROUP.

  • ValueError – If weight_quant_granularity is QuantGranularity.PER_GROUP but group_size is not in [64, 128].

  • ValueError – If weight_quant_granularity is not QuantGranularity.PER_GROUP but group_size != 0.

  • TypeError – If group_size is not Int.

Examples

>>> from mindspore_gs.ptq import PTQConfig, PTQMode
>>> from mindspore_gs.common import BackendTarget
>>> PTQConfig(mode=PTQMode.DEPLOY, backend=BackendTarget.ASCEND, opname_blacklist=['layer0'])
PTQConfig(mode=<PTQMode.DEPLOY: 'deploy'>, backend=<BackendTarget.ASCEND: 'ascend'>, opname_blacklist=['layer0'], algo_args={})