Class WhitespaceTokenizer

Inheritance Relationships

Base Type

Class Documentation

class WhitespaceTokenizer : public mindspore::dataset::TensorTransform

Tokenize a scalar tensor of UTF-8 string on ICU4C defined whitespaces.

Public Functions

explicit WhitespaceTokenizer(bool with_offsets = false)

Constructor.

Parameters

with_offsets[in] whether to output offsets of tokens (default=false).

Example
/* Define operations */
auto tokenizer_op = text::WhitespaceTokenizer(false);

/* dataset is an instance of Dataset object */
dataset = dataset->Map({tokenizer_op},   // operations
                       {"text"});        // input columns
~WhitespaceTokenizer() = default

Destructor.