mindspore.dataset.text.Ngram
- class mindspore.dataset.text.Ngram(n, left_pad=('', 0), right_pad=('', 0), separator=' ')[source]
- Generate n-gram from a 1-D string Tensor. - Refer to N-gram for an overview of what n-gram is and how it works. - Parameters
- n (list[int]) – n in n-gram, which is a list of positive integers. For example, if n=[4, 3], then the result would be a 4-gram followed by a 3-gram in the same tensor. If the number of words is not enough to make up for a n-gram, an empty string will be returned. For example, 3 grams on ["mindspore", "best"] will result in an empty string produced. 
- left_pad (tuple, optional) – Padding performed on left side of the sequence shaped like ("pad_token", pad_width). pad_width will be capped at n-1. For example, specifying left_pad=("_", 2) would pad left side of the sequence with "__". Default: - ('', 0).
- right_pad (tuple, optional) – Padding performed on right side of the sequence shaped like ("pad_token", pad_width). pad_width will be capped at n-1. For example, specifying right_pad=("_", 2) would pad right side of the sequence with "__". Default: - ('', 0).
- separator (str, optional) – Symbol used to join strings together. For example, if 2-gram is ["mindspore", "amazing"] with separator is - "-", the result would be ["mindspore-amazing"]. Default:- ' ', which will use whitespace as separator.
 
- Raises
- TypeError – If values of n not positive is not of type int. 
- ValueError – If values of n not positive. 
- ValueError – If left_pad is not a tuple of length 2. 
- ValueError – If right_pad is not a tuple of length 2. 
- TypeError – If separator is not of type string. 
 
 - Supported Platforms:
- CPU
 - Examples - >>> import numpy as np >>> import mindspore.dataset as ds >>> import mindspore.dataset.text as text >>> >>> # Use the transform in dataset pipeline mode >>> def gen(texts): ... for line in texts: ... yield(np.array(line.split(" "), dtype=str),) >>> data = ["WildRose Country", "Canada's Ocean Playground", "Land of Living Skies"] >>> generator_dataset = ds.GeneratorDataset(gen(data), ["text"]) >>> ngram_op = text.Ngram(3, separator="-") >>> generator_dataset = generator_dataset.map(operations=ngram_op) >>> for item in generator_dataset.create_dict_iterator(num_epochs=1, output_numpy=True): ... print(item["text"]) ... break [''] >>> >>> # Use the transform in eager mode >>> output = ngram_op(data) >>> print(output) ["WildRose Country-Canada's Ocean Playground-Land of Living Skies"] - Tutorial Examples: