Differences with torchtext.data.utils.ngrams_iterator

View Source On Gitee

torchtext.data.utils.ngrams_iterator

torchtext.data.utils.ngrams_iterator(
    token_list,
    ngrams
)

For more information, see torchtext.data.utils.ngrams_iterator.

mindspore.dataset.text.Ngram

class mindspore.dataset.text.Ngram(
    n,
    left_pad=("", 0),
    right_pad=("", 0),
    separator=" "
)

For more information, see mindspore.dataset.text.Ngram.

Differences

PyTorch: Generate n-gram from a 1-D string Tensor.

MindSpore: Generate n-gram from a 1-D string Tensor, string padding and connecting character are supported.

Categories

Subcategories

PyTorch

MindSpore

Differences

Parameters

Parameters 1

token_list

-

A list of tokens, uasge see code example below

Parameters 2

ngrams

n

n-gram number

Parameters 3

-

left_pad

Strings to be paded left side

Parameters 4

-

right_pad

Strings to be paded right side

Parameters 5

-

separator

Symbol used to join strings together

Code Example

# In torch, return an iterator that yields the given tokens and their ngrams.
from torchtext.data.utils import ngrams_iterator

token_list = ['here', 'we', 'are']
print(list(ngrams_iterator(token_list, 2)))
# Out:
# ['here', 'we', 'are', 'here we', 'we are']

# In MindSpore, output numpy.ndarray type n-gram.
from mindspore.dataset import text

ngram_op = text.Ngram([2, 1], separator=" ")
token_list = ['here', 'we', 'are']
output = ngram_op(token_list)
print(output)
# Out:
# ['here we' 'we are' 'here' 'we' 'are']