mindspore.dataset.text.transforms.NormalizeUTF8
- class mindspore.dataset.text.transforms.NormalizeUTF8(normalize_form=NormalizeForm.NFKC)[source]
Apply normalize operation on UTF-8 string tensor.
Note
NormalizeUTF8 is not supported on Windows platform yet.
- Parameters
normalize_form (NormalizeForm, optional) –
Valid values can be [NormalizeForm.NONE, NormalizeForm.NFC, NormalizeForm.NFKC, NormalizeForm.NFD, NormalizeForm.NFKD] any of the four unicode normalized forms(default=NormalizeForm.NFKC). See http://unicode.org/reports/tr15/ for details.
NormalizeForm.NONE, do nothing for input string tensor.
NormalizeForm.NFC, normalize with Normalization Form C.
NormalizeForm.NFKC, normalize with Normalization Form KC.
NormalizeForm.NFD, normalize with Normalization Form D.
NormalizeForm.NFKD, normalize with Normalization Form KD.
Examples
>>> from mindspore.dataset.text import NormalizeForm >>> normalize_op = text.NormalizeUTF8(normalize_form=NormalizeForm.NFC) >>> text_file_dataset = text_file_dataset.map(operations=normalize_op)