mindspore.dataset.text.transforms.Lookup
- class mindspore.dataset.text.transforms.Lookup(vocab, unknown_token=None, data_type=mstype.int32)[source]
Look up a word into an id according to the input vocabulary table.
- Parameters
vocab (Vocab) – A vocabulary object.
unknown_token (str, optional) – Word is used for lookup. In case of the word is out of vocabulary (OOV), the result of lookup will be replaced with unknown_token. If the unknown_token is not specified or it is OOV, runtime error will be thrown (default={}, means no unknown_token is specified).
data_type (mindspore.dtype, optional) – The data type that lookup operation maps string to(default=mindspore.int32).
Examples
>>> # Load vocabulary from list >>> vocab = text.Vocab.from_list(['深', '圳', '欢', '迎', '您']) >>> # Use Lookup operator to map tokens to ids >>> lookup = text.Lookup(vocab) >>> text_file_dataset = text_file_dataset.map(operations=[lookup])