mindspore.dataset.text.Lookup
- class mindspore.dataset.text.Lookup(vocab, unknown_token=None, data_type=mstype.int32)[source]
Look up a word into an id according to the input vocabulary table.
- Parameters
vocab (Vocab) – A vocabulary object.
unknown_token (str, optional) – Word is used for lookup. In case of the word is out of vocabulary (OOV), the result of lookup will be replaced with unknown_token. If the unknown_token is not specified or it is OOV, runtime error will be thrown. Default:
None
, means no unknown_token is specified.data_type (mindspore.dtype, optional) – The data type that lookup operation maps string to. Default:
mstype.int32
.
- Raises
- Supported Platforms:
CPU
Examples
>>> import mindspore.dataset as ds >>> import mindspore.dataset.text as text >>> >>> # Use the transform in dataset pipeline mode >>> numpy_slices_dataset = ds.NumpySlicesDataset(data=["with"], column_names=["text"]) >>> # Load vocabulary from list >>> vocab = text.Vocab.from_list(["?", "##", "with", "the", "test", "符号"]) >>> # Use Lookup operation to map tokens to ids >>> lookup = text.Lookup(vocab) >>> numpy_slices_dataset = numpy_slices_dataset.map(operations=[lookup]) >>> for item in numpy_slices_dataset.create_dict_iterator(num_epochs=1, output_numpy=True): ... print(item["text"]) 2 >>> >>> # Use the transform in eager mode >>> vocab = text.Vocab.from_list(["?", "##", "with", "the", "test", "符号"]) >>> data = "with" >>> output = text.Lookup(vocab=vocab, unknown_token="test")(data) >>> print(output) 2
- Tutorial Examples: