mindspore_gs.ptq.PTQ
- class mindspore_gs.ptq.PTQ(config: Union[dict, PTQConfig] = None)[source]
Implementation of PTQ algorithm which supports the combination quantization of activation, weight, and kvcache.
- Parameters
config (
mindspore_gs.ptq.PTQConfig
, optional) – config for PTQ, default isNone
.- Raises
TypeError – If config type is not PTQConfig when it's not
None
.ValueError – If not PYNATIVE mode when mode in config is PTQMode.QUANTIZE.
ValueError – If act_quant_dtype is int8 and weight_quant_dtype is None.
Examples
>>> import mindspore_gs >>> from mindspore_gs.ptq import PTQ >>> from mindspore_gs.ptq import PTQConfig >>> from mindspore_gs.ptq.network_helpers.mf_net_helpers import MFLlama2Helper >>> from mindformers.tools.register.config import MindFormerConfig >>> from mindformers import LlamaForCausalLM, LlamaConfig >>> from mindspore_gs.common.gs_enum import BackendTarget >>> mf_yaml_config_file = "/path/to/mf_yaml_config_file" >>> mfconfig = MindFormerConfig(mf_yaml_config_file) >>> helper = MFLlama2Helper(mfconfig) >>> backend = BackendTarget.ASCEND >>> ptq_config = PTQConfig(mode=PTQMode.QUANTIZE, backend=backend, opname_blacklist=["w2", "lm_head"], weight_quant_dtype=msdtype.int8, act_quant_dtype=msdtype.int8, outliers_suppression=OutliersSuppressionType.SMOOTH) >>> ptq = PTQ(ptq_config) >>> network = LlamaForCausalLM(LlamaConfig(**mfconfig.model.model_config)) >>> fake_quant_net = ptq.apply(network, helper) >>> quant_net = ptq.convert(fake_quant_net)
- apply(network: Cell, network_helper: NetworkHelper = None, datasets=None, **kwargs)[source]
Define how to add fake quantizer to network.
- Parameters
network (Cell) – Network to be fake quantized.
network_helper (NetworkHelper) – Utils for decoupling algorithm with network framework.
datasets (Dataset) – Datasets for calibrating.
- Returns
fake quantized network.
- Raises
RuntimeError – If PTQ is not well inited.
TypeError – If input network is not a Cell.
ValueError – If input network_helper is None when mode is PTQMode.DEPLOY.
ValueError – If input datasets is None.
- convert(net_opt: Cell, ckpt_path='')[source]
Define how to convert a compressed network to a standard network before exporting.
- Parameters
net_opt (Cell) – Network to be converted which is transformed by RoundToNearest.apply.
ckpt_path (str) – Path to checkpoint file for net_opt. Default is
""
, which means not loading checkpoint file to net_opt.
- Returns
An instance of Cell represents quantized network.
- Raises
TypeError – If net_opt is not Cell.
TypeError – If ckpt_path is not string.
ValueError – If ckpt_path is not empty and invalid.