Differences with torchtext.data.functional.simple_space_split

View Source On Gitee

torchtext.data.functional.simple_space_split

torchtext.data.functional.simple_space_split(iterator)

For more information, see torchtext.data.functional.simple_space_split.

mindspore.dataset.text.WhitespaceTokenizer

class mindspore.dataset.text.WhitespaceTokenizer(with_offsets=False)

For more information, see mindspore.dataset.text.WhitespaceTokenizer.

Differences

PyTorch: Tokenize a string on with whitespaces.

MindSpore: Tokenize a string on with whitespaces.

Categories

Subcategories

PyTorch

MindSpore

Difference

Parameter

Parameter1

-

with_offsets

Whether to output offsets of tokens

Code Example

# PyTorch
from torchtext.data.functional import simple_space_split

list_a = "sentencepiece encode as pieces"
result = simple_space_split([list_a])
print(list(result))
# Out: [['sentencepiece', 'encode', 'as', 'pieces']]

# MindSpore
import mindspore.dataset.text as text

result = text.WhitespaceTokenizer()(list_a)
print(list(result))
# Out: ['sentencepiece', 'encode', 'as', 'pieces']