Class Lookup

Inheritance Relationships

Base Type

Class Documentation

class Lookup : public mindspore::dataset::TensorTransform

Look up a word into an id according to the input vocabulary table.

Public Functions

inline explicit Lookup(const std::shared_ptr<Vocab> &vocab, const std::optional<std::string> &unknown_token = {}, mindspore::DataType data_type = mindspore::DataType::kNumberTypeInt32)

Constructor.

Parameters
  • vocab[in] a Vocab object.

  • unknown_token[in] Word is used for lookup. In case of the word is out of vocabulary (OOV), the result of lookup will be replaced to unknown_token. If the unknown_token is not specified or it is OOV, runtime error will be thrown (default={}, means no unknown_token is specified).

  • data_type[in] mindspore::DataType of the tensor after lookup; must be numeric, including bool. (default=mindspore::DataType::kNumberTypeInt32).

Example
 /* Define operations */
std::vector<std::string> list = {"a", "b", "c", "d"};
 std::shared_ptr<Vocab> vocab = std::make_shared<Vocab>();
 Status s = Vocab::BuildFromVector(list, {}, true, &vocab);
 auto lookup_op = text::Lookup(vocab, "[unk]");

 /* dataset is an instance of Dataset object */
 dataset = dataset->Map({lookup_op},   // operations
                        {"text"});     // input columns
Lookup(const std::shared_ptr<Vocab> &vocab, const std::optional<std::vector<char>> &unknown_token, mindspore::DataType data_type = mindspore::DataType::kNumberTypeInt32)

Constructor.

Parameters
  • vocab[in] a Vocab object.

  • unknown_token[in] Word is used for lookup. In case of the word is out of vocabulary (OOV), the result of lookup will be replaced to unknown_token. If the unknown_token is not specified or it is OOV, runtime error will be thrown (default={}, means no unknown_token is specified).

  • data_type[in] mindspore::DataType of the tensor after lookup; must be numeric, including bool. (default=mindspore::DataType::kNumberTypeInt32).

~Lookup() = default

Destructor.