mindspore_gs.ptq.PTQConfig

class mindspore_gs.ptq.PTQConfig(mode=PTQMode.QUANTIZE, backend=BackendTarget.ASCEND, opname_blacklist=<class 'list'>, algo_args=<class 'dict'>, weight_quant_dtype=Int8, kvcache_quant_dtype=None, act_quant_dtype=None, outliers_suppression=OutliersSuppressionType.NONE, precision_recovery=PrecisionRecovery.NONE, weight_quant_granularity=QuantGranularity.PER_CHANNEL, group_size=0, act_quant_granularity=QuantGranularity.PER_TENSOR, kvcache_quant_granularity=QuantGranularity.PER_CHANNEL)[source]

Config for post trainning quantization.

Parameters

mode (mindspore_gs.ptq.PTQMode) – Flag for ptq mode, QUANTIZATION for quantization mode, DEPLOY for deploy mode.
backend (mindspore_gs.common.BackendTarget) – Flag for backend target, NONE for no specific backend, ASCEND for ascend backend.
opname_blacklist (List[str]) – Blacklist of opname. Layers in network with name fuzzy matched with this blacklist will not being quanted.
algo_args (Union[dict, dataclass]) – Used to configure hyperparameters of algorithms such as RTN and SmoothQuant.
act_quant_dtype (mindspore.dtype) – Used to configure the quantization type of activation. mindspore.dtype.int8 indicates that the activation is quantized by 8 bits, and None indicates that it is not quantized.
weight_quant_dtype (mindspore.dtype) – Used to configure the quantization type of weight. mindspore.dtype.int8 indicates that the weight is quantized by 8 bits, and None indicates that it is not quantized.
kvcache_quant_dtype (mindspore.dtype) – Used to configure the quantization type of kvcache. mindspore.dtype.int8 indicates that the kvcache is quantized by 8 bits, and None indicates that it is not quantized.
outliers_suppression (mindspore_gs.ptq.OutliersSuppressionType) – Used to configure outliers suppression method before quantization. OutliersSuppressionType.SMOOTH indicates using smooth method from SmoothQuant to suppress outliers, and OutliersSuppressionType.NONE as default indicates doing nothing for outliers.
precision_recovery (mindspore_gs.ptq.PrecisionRecovery) – Used to precision compensation of weights during quantization. PrecisionRecovery.GPTQ indicates using GPTQ method to compensate precision, and PrecisionRecovery.NONE as default indicates doing nothing for precision recovery.
act_quant_granularity (mindspore_gs.ptq.QuantGranularity) – Used to configure the quantization granularity of activation. Currently only QuantGranularity.PER_TENSOR and QuantGranularity.PER_TOKEN are supported.
kvcache_quant_granularity (mindspore_gs.ptq.QuantGranularity) – Used to configure the quantization granularity of kvcache. Currently only QuantGranularity.PER_CHANNEL and QuantGranularity.PER_TOKEN are supported.
weight_quant_granularity (mindspore_gs.ptq.QuantGranularity) – Used to configure the quantization granularity of weight. Currently only QuantGranularity.PER_CHANNEL and QuantGranularity.PER_GROUP are supported.
group_size (int, optional) – group_size of per_group quantization, suggest using 64 or 128. Default value: 0.

Raises

ValueError – If mode is not PTQMode.QUANTIZE or PTQMode.DEPLOY.
ValueError – If backend is not BackendTarget.NONE or BackendTarget.ASCEND.
TypeError – If opname_blacklist is not a list of str.
ValueError – If weight_quant_dtype is not mindspore.dtype.int8 or None.
ValueError – If kvcache_quant_dtype is not mindspore.dtype.int8 or None.
ValueError – If act_quant_dtype is not mindspore.dtype.int8 or None.
TypeError – If outliers_suppression is not a OutliersSuppressionType.
TypeError – If precision_recovery is not a PrecisionRecovery.
ValueError – If act_quant_granularity is not QuantGranularity.PER_TENSOR or QuantGranularity.PER_TOKEN.
ValueError – If kvcache_quant_granularity is not QuantGranularity.PER_CHANNEL or QuantGranularity.PER_TOKEN.
ValueError – If act_quant_granularity is QuantGranularity.PER_TOKEN but weight_quant_dtype != msdtype.int8 or act_quant_dtype != msdtype.int8.
ValueError – If kvcache_quant_granularity is QuantGranularity.PER_TOKEN but kvcache_quant_dtype != msdtype.int8.
ValueError – If weight_quant_granularity is not QuantGranularity.PER_CHANNEL or QuantGranularity.PER_GROUP.
ValueError – If weight_quant_granularity is QuantGranularity.PER_GROUP but group_size is not in [64, 128].
ValueError – If weight_quant_granularity is not QuantGranularity.PER_GROUP but group_size != 0.
TypeError – If group_size is not Int.

Examples

>>> from mindspore_gs.ptq import PTQConfig, PTQMode
>>> from mindspore_gs.common import BackendTarget
>>> PTQConfig(mode=PTQMode.DEPLOY, backend=BackendTarget.ASCEND, opname_blacklist=['layer0'])
PTQConfig(mode=<PTQMode.DEPLOY: 'deploy'>, backend=<BackendTarget.ASCEND: 'ascend'>, opname_blacklist=['layer0'], algo_args={})