mindspore.dataset.text.FilterWikipediaXML
- class mindspore.dataset.text.FilterWikipediaXML[source]
Filter Wikipedia XML dumps to “clean” text consisting only of lowercase letters (a-z, converted from A-Z), and spaces (never consecutive).
Note
FilterWikipediaXML is not supported on Windows platform yet.
- Supported Platforms:
CPU
Examples
>>> import mindspore.dataset.text.transforms as text >>> >>> replace_op = text.FilterWikipediaXML() >>> text_file_dataset = text_file_dataset.map(operations=replace_op)