mindspore.dataset.text.FilterWikipediaXML

class mindspore.dataset.text.FilterWikipediaXML[源代码]

Filter Wikipedia XML dumps to “clean” text consisting only of lowercase letters (a-z, converted from A-Z), and spaces (never consecutive).

Note

FilterWikipediaXML is not supported on Windows platform yet.

Supported Platforms:

CPU

Examples

>>> import mindspore.dataset.text.transforms as text
>>>
>>> replace_op = text.FilterWikipediaXML()
>>> text_file_dataset = text_file_dataset.map(operations=replace_op)