Class WordpieceTokenizer
Defined in File text.h
Inheritance Relationships
Base Type
public mindspore::dataset::TensorTransform
(Class TensorTransform)
Class Documentation
-
class WordpieceTokenizer : public mindspore::dataset::TensorTransform
Tokenize scalar token or 1-D tokens to 1-D sub-word tokens.
Public Functions
Constructor.
- Parameters
vocab – [in] A Vocab object.
suffix_indicator – [in] This parameter is used to show that the sub-word is the last part of a word (default=’##’).
max_bytes_per_token – [in] Tokens exceeding this length will not be further split (default=100).
unknown_token – [in] When a token cannot be found, return the token directly if ‘unknown_token’ is an empty string, else return the specified string (default=’[UNK]’).
with_offsets – [in] whether to output offsets of tokens (default=false).
-
~WordpieceTokenizer() = default
Destructor.