Differences with torchtext.data.functional.simple_space_split
torchtext.data.functional.simple_space_split
torchtext.data.functional.simple_space_split(iterator)
For more information, see torchtext.data.functional.simple_space_split.
mindspore.dataset.text.WhitespaceTokenizer
class mindspore.dataset.text.WhitespaceTokenizer(with_offsets=False)
For more information, see mindspore.dataset.text.WhitespaceTokenizer.
Differences
PyTorch: Tokenize a string on with whitespaces.
MindSpore: Tokenize a string on with whitespaces.
Categories |
Subcategories |
PyTorch |
MindSpore |
Difference |
---|---|---|---|---|
Parameter |
Parameter1 |
- |
with_offsets |
Whether to output offsets of tokens |
Code Example
# PyTorch
from torchtext.data.functional import simple_space_split
list_a = "sentencepiece encode as pieces"
result = simple_space_split([list_a])
print(list(result))
# Out: [['sentencepiece', 'encode', 'as', 'pieces']]
# MindSpore
import mindspore.dataset.text as text
result = text.WhitespaceTokenizer()(list_a)
print(list(result))
# Out: ['sentencepiece', 'encode', 'as', 'pieces']