Class UnicodeScriptTokenizer
Defined in File text.h
Inheritance Relationships
Base Type
public mindspore::dataset::TensorTransform
(Class TensorTransform)
Class Documentation
-
class UnicodeScriptTokenizer : public mindspore::dataset::TensorTransform
Tokenize a scalar tensor of UTF-8 string on Unicode script boundaries.
Public Functions
-
explicit UnicodeScriptTokenizer(bool keep_whitespace = false, bool with_offsets = false)
Constructor.
- Parameters
keep_whitespace – [in] whether to emit whitespace tokens (default=false).
with_offsets – [in] whether to output offsets of tokens (default=false).
Example/* Define operations */ auto tokenizer_op = text::UnicodeScriptTokenizer(false, true); /* dataset is an instance of Dataset object */ dataset = dataset->Map({tokenizer_op}, // operations {"text"}); // input columns
-
~UnicodeScriptTokenizer() = default
Destructor.
-
explicit UnicodeScriptTokenizer(bool keep_whitespace = false, bool with_offsets = false)