Class NormalizeUTF8

Inheritance Relationships

Base Type

Class Documentation

class NormalizeUTF8 : public mindspore::dataset::TensorTransform

Apply normalize operation to UTF-8 string tensors.

Public Functions

explicit NormalizeUTF8(NormalizeForm normalize_form = NormalizeForm::kNfkc)

Constructor.

Parameters

normalize_form[in] Valid values can be any of [NormalizeForm::kNone,NormalizeForm::kNfc, NormalizeForm::kNfkc, NormalizeForm::kNfd, NormalizeForm::kNfkd](default=NormalizeForm::kNfkc). See http://unicode.org/reports/tr15/ for details.

  • NormalizeForm.kNone, remain the input string tensor unchanged.

  • NormalizeForm.kNfc, normalizes with Normalization Form C.

  • NormalizeForm.kNfkc, normalizes with Normalization Form KC.

  • NormalizeForm.kNfd, normalizes with Normalization Form D.

  • NormalizeForm.kNfkd, normalizes with Normalization Form KD.

Example
/* Define operations */
auto normalizeutf8_op = text::NormalizeUTF8();

/* dataset is an instance of Dataset object */
dataset = dataset->Map({normalizeutf8_op},   // operations
                       {"text"});            // input columns
~NormalizeUTF8() = default

Destructor.