mindspore.dataset.text.Lookup

class mindspore.dataset.text.Lookup(vocab, unknown_token=None, data_type=mstype.int32)[source]

Look up a word into an id according to the input vocabulary table.

Parameters
  • vocab (Vocab) – A vocabulary object.

  • unknown_token (str, optional) – Word is used for lookup. In case of the word is out of vocabulary (OOV), the result of lookup will be replaced with unknown_token. If the unknown_token is not specified or it is OOV, runtime error will be thrown. Default: None, means no unknown_token is specified.

  • data_type (mindspore.dtype, optional) – The data type that lookup operation maps string to. Default: mstype.int32.

Raises
  • TypeError – If vocab is not of type text.Vocab.

  • TypeError – If unknown_token is not of type string.

  • TypeError – If data_type is not of type mindspore.dtype.

Supported Platforms:

CPU

Examples

>>> import mindspore.dataset as ds
>>> import mindspore.dataset.text as text
>>> # Load vocabulary from list
>>> vocab = text.Vocab.from_list(['深', '圳', '欢', '迎', '您'])
>>> # Use Lookup operation to map tokens to ids
>>> lookup = text.Lookup(vocab)
>>>
>>> text_file_list = ["/path/to/text_file_dataset_file"]
>>> text_file_dataset = ds.TextFileDataset(dataset_files=text_file_list)
>>> text_file_dataset = text_file_dataset.map(operations=[lookup])
Tutorial Examples: