Release Notes
MindSpore Golden Stick 0.6.0 Release Notes
Major Features and Improvements
The
RoundToNearest
supports Mindformers' KVCache int8 quantization now, i.e.PagedAttentionMgr
class, mainly for Llama2 networks.Added Post-Training Quantization algorithm named
PTQ
which supports SmoothQuant, A16W8, KVCacheInt8 and their combinations, such as A16W8 combined with KVCacheInt8, SmoothQuant combined with KVCacheInt8, etc., and the corresponding algorithm capabilities can be obtained by configuring PTQConfig. The algorithm is mainly supports ParallelLlama2 network from the MindFormers community.
API Change
PTQConfig
adds the following three parameters:act_quant_dtype
: The data type is mindspore.dtype. The default value is None. The options and meanings are as follows:act_quant_dtype
mindspore.dtype.int8
None(default)
meanings
quantize input to int8
does not quantize input
weight_quant_dtype
: The data type is mindspore.dtype. The default value is mindspore.dtype.int8. The options and meanings are as follows:weight_quant_dtype
mindspore.dtype.int8(default)
None
meanings
quantize weights to int8
does not quantize weights
kvcache_quant_dtype
: The data type is mindspore.dtype. The default value is None. The options and meanings are as follows:kvcache_quant_dtype
mindspore.dtype.int8
None(default)
meanings
quantize kvcache to int8
does not quantize kvcache
outliers_suppression
: The data type is OutliersSuppressionType. The default value is OutliersSuppressionType.NONE. The options and meanings are as follows:outliers_suppression
OutliersSuppressionType.SMOOTH
OutliersSuppressionType.NONE(default)
meanings
employ smooth approach to suppress outliers in activation and weight
does not suppress outliers
Contributors
Thanks goes to these wonderful people:
ccsszz, yyyyrf, hangangqiang
Contributions of any kind are welcome!