mindformers.AutoTokenizer
- class mindformers.AutoTokenizer[source]
This is a generic tokenizer class that will be instantiated as one of the tokenizer classes of the library when created with the from_pretrained class method. This class cannot be instantiated directly using __init__() (throws an error).
Examples
>>> from mindformers import AutoTokenizer >>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
- classmethod from_pretrained(yaml_name_or_path, *args, **kwargs)[source]
From pretrain method, which instantiates a tokenizer by a directory or model_id from modelers.cn.
Warning
The API is experimental and may have some slight breaking changes in the next releases.
- Parameters
yaml_name_or_path (str) – a folder containing YAML file, a folder containing JSON file, or a model_id from modelers.cn. The last two are experimental features.
args (Any, optional) – Will be passed along to the underlying tokenizer __init__() method. Only works in experimental mode.
kwargs (Dict[str, Any], optional) – The values in kwargs of any keys which are configuration attributes will be used to override the loaded values.
- Returns
A tokenizer.
- static register(config_class, slow_tokenizer_class=None, fast_tokenizer_class=None, exist_ok=False)[source]
Register new tokenizers for this class.
Warning
The API is experimental and may have some slight breaking changes in the next releases.
- Parameters
config_class (PretrainedConfig) – The model config class.
slow_tokenizer_class (PreTrainedTokenizer, optional) – The slow_tokenizer class. Default:
None
.fast_tokenizer_class (PreTrainedTokenizerFast, optional) – The fast_tokenizer class. Default:
None
.exist_ok (bool, optional) – If set to True, no error will be raised even if config_class already exists. Default:
False
.