mindspore.dataset.text.NormalizeUTF8

View Source On Gitee
class mindspore.dataset.text.NormalizeUTF8(normalize_form=NormalizeForm.NFKC)[source]

Normalize the input UTF-8 encoded strings.

Note

NormalizeUTF8 is not supported on Windows platform yet.

Parameters

normalize_form (NormalizeForm, optional) – The desired normalization form. See NormalizeForm for details on optional values. Default: NormalizeForm.NFKC .

Raises

TypeError – If normalize_form is not of type NormalizeForm.

Supported Platforms:

CPU

Examples

>>> import mindspore.dataset as ds
>>> import mindspore.dataset.text as text
>>> from mindspore.dataset.text import NormalizeForm
>>>
>>> normalize_op = text.NormalizeUTF8(normalize_form=NormalizeForm.NFC)
>>> text_file_list = ["/path/to/text_file_dataset_file"]
>>> text_file_dataset = ds.TextFileDataset(dataset_files=text_file_list)
>>> text_file_dataset = text_file_dataset.map(operations=normalize_op)
Tutorial Examples: