Class UnicodeScriptTokenizer

Inheritance Relationships

Base Type

Class Documentation

class UnicodeScriptTokenizer : public mindspore::dataset::TensorTransform

Tokenize a scalar tensor of UTF-8 string on Unicode script boundaries.

Public Functions

explicit UnicodeScriptTokenizer(bool keep_whitespace = false, bool with_offsets = false)

Constructor.

Parameters
  • keep_whitespace[in] whether to emit whitespace tokens (default=false).

  • with_offsets[in] whether to output offsets of tokens (default=false).

~UnicodeScriptTokenizer() = default

Destructor.