Class RegexTokenizer
Defined in File text.h
Inheritance Relationships
Base Type
public mindspore::dataset::TensorTransform
(Class TensorTransform)
Class Documentation
-
class RegexTokenizer : public mindspore::dataset::TensorTransform
Tokenize a scalar tensor of UTF-8 string by the regex expression pattern.
Public Functions
-
inline explicit RegexTokenizer(std::string delim_pattern, std::string keep_delim_pattern = "", bool with_offsets = false)
Constructor.
- Parameters
delim_pattern – [in] The pattern of regex delimiters.
keep_delim_pattern – [in] The string matched with ‘delim_pattern’ can be kept as a token if it can be matched by ‘keep_delim_pattern’. The default value is an empty string (“”). which means that delimiters will not be kept as an output token (default=””).
with_offsets – [in] Whether to output offsets of tokens (default=false).
-
~RegexTokenizer() = default
Destructor.
-
inline explicit RegexTokenizer(std::string delim_pattern, std::string keep_delim_pattern = "", bool with_offsets = false)