mindspore_gs.ptq.PTQ

View Source On Gitee
class mindspore_gs.ptq.PTQ(config: Union[dict, PTQConfig] = None)[source]

Implementation of PTQ algorithm which supports the combination quantization of activation, weight, and kvcache.

Parameters

config (mindspore_gs.ptq.PTQConfig, optional) – config for PTQ, default is None.

Raises
  • TypeError – If config type is not PTQConfig when it's not None.

  • ValueError – If not PYNATIVE mode when mode in config is PTQMode.QUANTIZE.

  • ValueError – If act_quant_dtype is int8 and weight_quant_dtype is None.

Examples

>>> import mindspore_gs
>>> from mindspore_gs.ptq import PTQ
>>> from mindspore_gs.ptq import PTQConfig
>>> from mindspore_gs.ptq.network_helpers.mf_net_helpers import MFLlama2Helper
>>> from mindformers.tools.register.config import MindFormerConfig
>>> from mindformers import LlamaForCausalLM, LlamaConfig
>>> from mindspore_gs.common.gs_enum import BackendTarget
>>> mf_yaml_config_file = "/path/to/mf_yaml_config_file"
>>> mfconfig = MindFormerConfig(mf_yaml_config_file)
>>> helper = MFLlama2Helper(mfconfig)
>>> backend = BackendTarget.ASCEND
>>> ptq_config = PTQConfig(mode=PTQMode.QUANTIZE, backend=backend, opname_blacklist=["w2", "lm_head"],
                weight_quant_dtype=msdtype.int8, act_quant_dtype=msdtype.int8,
                outliers_suppression=OutliersSuppressionType.SMOOTH)
>>> ptq = PTQ(ptq_config)
>>> network = LlamaForCausalLM(LlamaConfig(**mfconfig.model.model_config))
>>> fake_quant_net = ptq.apply(network, helper)
>>> quant_net = ptq.convert(fake_quant_net)
apply(network: Cell, network_helper: NetworkHelper = None, datasets=None, **kwargs)[source]

Define how to add fake quantizer to network.

Parameters
  • network (Cell) – Network to be fake quantized.

  • network_helper (NetworkHelper) – Utils for decoupling algorithm with network framework.

  • datasets (Dataset) – Datasets for calibrating.

Returns

fake quantized network.

Raises
  • RuntimeError – If PTQ is not well inited.

  • TypeError – If input network is not a Cell.

  • ValueError – If input network_helper is None when mode is PTQMode.DEPLOY.

  • ValueError – If input datasets is None.

convert(net_opt: Cell, ckpt_path='')[source]

Define how to convert a compressed network to a standard network before exporting.

Parameters
  • net_opt (Cell) – Network to be converted which is transformed by RoundToNearest.apply.

  • ckpt_path (str) – Path to checkpoint file for net_opt. Default is "", which means not loading checkpoint file to net_opt.

Returns

An instance of Cell represents quantized network.

Raises