Class SentencePieceTokenizer
Defined in File text.h
Inheritance Relationships
Base Type
public mindspore::dataset::TensorTransform
(Class TensorTransform)
Class Documentation
-
class SentencePieceTokenizer : public mindspore::dataset::TensorTransform
Tokenize a scalar token or a 1-D token to tokens by sentencepiece.
Public Functions
-
SentencePieceTokenizer(const std::shared_ptr<SentencePieceVocab> &vocab, mindspore::dataset::SPieceTokenizerOutType out_type)
Constructor.
- Parameters
vocab – [in] a SentencePieceVocab object.
out_type – [in] The type of the output.
-
inline SentencePieceTokenizer(const std::string &vocab_path, mindspore::dataset::SPieceTokenizerOutType out_type)
Constructor.
- Parameters
vocab_path – [in] vocab model file path.
out_type – [in] The type of the output.
-
SentencePieceTokenizer(const std::vector<char> &vocab_path, mindspore::dataset::SPieceTokenizerOutType out_type)
Constructor.
- Parameters
vocab_path – [in] vocab model file path. type should be char of vector.
out_type – [in] The type of the output.
-
~SentencePieceTokenizer() = default
Destructor.
-
SentencePieceTokenizer(const std::shared_ptr<SentencePieceVocab> &vocab, mindspore::dataset::SPieceTokenizerOutType out_type)