mindspore_gs.ptq.PTQConfig
- class mindspore_gs.ptq.PTQConfig(mode = PTQMode.QUANTIZE, backend = BackendTarget.ASCEND, opname_blacklist = <class 'list'>, algo_args = <class 'dict'>, weight_quant_dtype = Int8, kvcache_quant_dtype = None, act_quant_dtype = None, precision_recovery = PrecisionRecovery.NONE, outliers_suppression = OutliersSuppressionType.NONE, weight_quant_granularity = QuantGranularity.PER_CHANNEL, kvcache_quant_granularity = QuantGranularity.PER_CHANNEL, act_quant_granularity = QuantGranularity.PER_TENSOR, group_size = 0)[source]
Config for post trainning quantization.
- Parameters
mode (
mindspore_gs.ptq.PTQMode
) – Flag for ptq mode,QUANTIZATION
for quantization mode,DEPLOY
for deploy mode.backend (
mindspore_gs.common.BackendTarget
) – Flag for backend target,NONE
for no specific backend,ASCEND
for ascend backend.opname_blacklist (List[str]) – Blacklist of opname. Layers in network with name fuzzy matched with this blacklist will not being quanted.
algo_args (Union[dict, dataclass]) – Used to configure hyperparameters of algorithms such as RTN, SmoothQuant, and OmniQuant.
act_quant_dtype (mindspore.dtype) – Used to configure the quantization type of activation. mindspore.dtype.int8 indicates that the activation is quantized by 8 bits, and None indicates that it is not quantized.
weight_quant_dtype (mindspore.dtype) – Used to configure the quantization type of weight. mindspore.dtype.int8 indicates that the weight is quantized by 8 bits, and None indicates that it is not quantized.
kvcache_quant_dtype (mindspore.dtype) – Used to configure the quantization type of kvcache. mindspore.dtype.int8 indicates that the kvcache is quantized by 8 bits, and None indicates that it is not quantized.
outliers_suppression (
mindspore_gs.ptq.OutliersSuppressionType
) – Used to configure outliers suppression method before quantization. OutliersSuppressionType.SMOOTH indicates using smooth method from SmoothQuant to suppress outliers, and OutliersSuppressionType.NONE as default indicates doing nothing for outliers.precision_recovery (
mindspore_gs.ptq.PrecisionRecovery
) – Used to precision compensation of weights during quantization. PrecisionRecovery.GPTQ indicates using GPTQ method to compensate precision, and PrecisionRecovery.NONE as default indicates doing nothing for precision recovery.act_quant_granularity (
mindspore_gs.ptq.QuantGranularity
) – Used to configure the quantization granularity of activation. Currently only QuantGranularity.PER_TENSOR and QuantGranularity.PER_TOKEN are supported.kvcache_quant_granularity (
mindspore_gs.ptq.QuantGranularity
) – Used to configure the quantization granularity of kvcache. Currently only QuantGranularity.PER_CHANNEL and QuantGranularity.PER_TOKEN are supported.weight_quant_granularity (
mindspore_gs.ptq.QuantGranularity
) – Used to configure the quantization granularity of weight. Currently only QuantGranularity.PER_CHANNEL and QuantGranularity.PER_GROUP are supported.group_size (int, optional) – group_size of per_group quantization, suggest using 64 or 128. Default value:
0
.
- Raises
ValueError – If mode is not PTQMode.QUANTIZE or PTQMode.DEPLOY.
ValueError – If backend is not BackendTarget.NONE or BackendTarget.ASCEND.
TypeError – If opname_blacklist is not a list of str.
ValueError – If weight_quant_dtype is not mindspore.dtype.int8 or None.
ValueError – If kvcache_quant_dtype is not mindspore.dtype.int8 or None.
ValueError – If act_quant_dtype is not mindspore.dtype.int8 or None.
TypeError – If outliers_suppression is not a OutliersSuppressionType.
TypeError – If precision_recovery is not a PrecisionRecovery.
ValueError – If act_quant_granularity is not QuantGranularity.PER_TENSOR or QuantGranularity.PER_TOKEN.
ValueError – If kvcache_quant_granularity is not QuantGranularity.PER_CHANNEL or QuantGranularity.PER_TOKEN.
ValueError – If act_quant_granularity is QuantGranularity.PER_TOKEN but weight_quant_dtype != msdtype.int8 or act_quant_dtype != msdtype.int8.
ValueError – If kvcache_quant_granularity is QuantGranularity.PER_TOKEN but kvcache_quant_dtype != msdtype.int8.
ValueError – If weight_quant_granularity is not QuantGranularity.PER_CHANNEL or QuantGranularity.PER_GROUP.
ValueError – If weight_quant_granularity is QuantGranularity.PER_GROUP but group_size is not in [64, 128].
ValueError – If weight_quant_granularity is not QuantGranularity.PER_GROUP but group_size != 0.
TypeError – If group_size is not Int.
Examples
>>> from mindspore_gs.ptq import PTQConfig, PTQMode >>> from mindspore_gs.common import BackendTarget >>> PTQConfig(mode=PTQMode.DEPLOY, backend=BackendTarget.ASCEND, opname_blacklist=['layer0']) PTQConfig(mode=<PTQMode.DEPLOY: 'deploy'>, backend=<BackendTarget.ASCEND: 'ascend'>, opname_blacklist=['layer0'], algo_args={})