mindspore.dataset.text.PythonTokenizer
- class mindspore.dataset.text.PythonTokenizer(tokenizer)[source]
Class that applies user-defined string tokenizer into input string.
- Parameters
tokenizer (Callable) – Python function that takes a str and returns a list of str as tokens.
- Raises
TypeError – If tokenizer is not a callable Python function.
- Supported Platforms:
CPU
Examples
>>> import mindspore.dataset as ds >>> import mindspore.dataset.text as text >>> >>> def my_tokenizer(line): ... return line.split() >>> >>> text_file_list = ["/path/to/text_file_dataset_file"] >>> text_file_dataset = ds.TextFileDataset(dataset_files=text_file_list) >>> text_file_dataset = text_file_dataset.map(operations=text.PythonTokenizer(my_tokenizer))
- Tutorial Examples: