Class Ngram
Defined in File text.h
Inheritance Relationships
Base Type
public mindspore::dataset::TensorTransform
(Class TensorTransform)
Class Documentation
class Ngram : public mindspore::dataset::TensorTransform
Generate n-gram from a 1-D string Tensor.
Public Functions
inline explicit Ngram(const std::vector<int32_t> &ngrams, const std::pair<std::string, int32_t> &left_pad = {"", 0}, const std::pair<std::string, int32_t> &right_pad = {"", 0}, const std::string &separator = " ")
- Parameters
ngrams – [in] ngrams is a vector of positive integers. For example, if ngrams={4, 3}, then the result would be a 4-gram followed by a 3-gram in the same tensor. If the number of words is not enough to make up a n-gram, an empty string will be returned.
left_pad – [in] {“pad_token”, pad_width}. Padding performed on left side of the sequence. pad_width will be capped at n-1. left_pad=(“_”,2) would pad the left side of the sequence with “__” (default={“”, 0}}).
right_pad – [in] {“pad_token”, pad_width}. Padding performed on right side of the sequence.pad_width will be capped at n-1. right_pad=(“-“,2) would pad the right side of the sequence with “–” (default={“”, 0}}).
separator – [in] Symbol used to join strings together (default=” “).
样例/* Define operations */ auto ngram_op = text::Ngram({2, 3}, {"&", 2}, {"&", 2}, "-"); /* dataset is an instance of Dataset object */ dataset = dataset->Map({ngram_op}, // operations {"text"}); // input columns
Ngram(const std::vector<int32_t> &ngrams, const std::pair<std::vector<char>, int32_t> &left_pad, const std::pair<std::vector<char>, int32_t> &right_pad, const std::vector<char> &separator)
- Parameters
ngrams – [in] ngrams is a vector of positive integers. For example, if ngrams={4, 3}, then the result would be a 4-gram followed by a 3-gram in the same tensor. If the number of words is not enough to make up a n-gram, an empty string will be returned.
left_pad – [in] {“pad_token”, pad_width}. Padding performed on left side of the sequence. pad_width will be capped at n-1. left_pad=(“_”,2) would pad the left side of the sequence with “__” (default={“”, 0}}).
right_pad – [in] {“pad_token”, pad_width}. Padding performed on right side of the sequence.pad_width will be capped at n-1. right_pad=(“-“,2) would pad the right side of the sequence with “–” (default={“”, 0}}).
separator – [in] Symbol used to join strings together (default=” “).
~Ngram() = default
inline explicit Ngram(const std::vector<int32_t> &ngrams, const std::pair<std::string, int32_t> &left_pad = {"", 0}, const std::pair<std::string, int32_t> &right_pad = {"", 0}, const std::string &separator = " ")