mindspore.experimental.es.EmbeddingService

View Source On Gitee
class mindspore.experimental.es.EmbeddingService[source]

Currently, ES(EmbeddingService) feature can only create one object which can support model training and inference for PS embedding and data_parallel embedding, and provide unified embedding management, storage, and computing capabilities for training and inference. PS embedding refer to tables that vocab_size more than 100,000, and recommended to store them on the Parameter Server (PS). Data_parallel embedding refer to tables that vocab_size less than 100,000, and recommended to store them on device.

Warning

This is an experimental EmbeddingService API that is subject to change.

Note

This API needs to call 'mindspore.communication.init()' before, and it can take effect after the dynamic networking is completed.

Raises
  • ValueError – If the ESCLUSTER_CONFIG_PATH environment variable is not set during object instantiation.

  • ValueError – If the number of each server ParameterServer configured in ESCLUSTER_CONFIG_PATH configuration file exceeds four.

  • ValueError – If the number of ParameterServer configured in ESCLUSTER_CONFIG_PATH configuration file exceeds four.

Supported Platforms:

Atlas A2 training series products

completion_key(completion_key, mask=True)[source]

Init completion key option for each PS embedding.

Parameters
  • completion_key (int) – The value for completion key.

  • mask (bool) – Whether to update completion key. If set to false, it will not be updated, and default is True.

Returns

CompletionKeyOption object.

Raises
  • TypeError – If the type of "completion_key" is not int.

  • TypeError – If the type of "mask" is not bool.

counter_filter(filter_freq, default_key=None, default_value=None)[source]

Init counter filter option for each PS embedding.

Note

This feature only supports training mode. When user set counter filter option in train mode and then eval, the default value can be used for the key that cannot be look up in eval.

Parameters
  • filter_freq (int) – The frequency threshold value for feature admission.

  • default_key (int) – The key that number of occurrences does not reach the threshold, return value of default key as the corresponding value when look up embedding, and default is None.

  • default_value (int/float) – The key that number of occurrences does not reach the threshold, return default value which length value is embedding dim, and default is None.

Returns

CounterFilter object.

Raises
  • TypeError – If the type of "filter_freq" is not int.

  • ValueError – If the value of "filter_freq" is less than 0.

  • ValueError – If the values of "default_key" and "default_value" are None.

  • ValueError – If neither of the values of "default_key" and "default_value" are None.

  • TypeError – If the value of "default_key" is None and the type of "default_value" is neither int nor float.

  • TypeError – If the value of "default_value" is None and the type of "default_key" is not int.

embedding_ckpt_export(file_path, trainable_var)[source]

Export the embedding table and optimizer parameters of each PS embedding, and export embedding of a data_parallel embedding.

Note

This function can only be executed by rank 0. Need to call embedding_variable_option to set evict_option for each PS embedding before export.

Parameters
  • file_path (str) – The path to export embedding ckpt, and the last character cannot be "/".

  • trainable_var (list[parameter]) – The list of data_parallel embedding parameter.

Returns

The output of EmbeddingComputeVarExport operator and data_parallel embedding export result.

embedding_ckpt_import(file_path)[source]

Import embedding and ckpt file from file path.

Parameters

file_path (str) – The path to import embedding and ckpt, and the last character cannot be "/".

Returns

The output of EmbeddingComputeVarImport operator and data_parallel embedding import result.

embedding_evict(steps_to_live)[source]

Embedding evict for all PS embedding.

Parameters

steps_to_live (int) – The steps set for evict key.

Returns

The output of ESEmbeddingTableEvict op.

Raises
  • TypeError – If the type of "steps_to_live" is not int.

  • ValueError – If the value of "steps_to_live" is not greater than 0.

embedding_init(name, init_vocabulary_size, embedding_dim, max_feature_count=None, initializer=Uniform(scale=0.01), embedding_type='PS', ev_option=None, multihot_lens=None, optimizer=None, allow_merge=False, optimizer_param=None, mode='train')[source]

Init for PS embedding and data_parallel embedding.

Parameters
  • name (str) – The embedding table name.

  • init_vocabulary_size (int) – The size of embedding table.

  • embedding_dim (int) – The embedding dim of data in embedding table.

  • max_feature_count (int) – The count of keys when look up for PS.

  • initializer (Initializer) – The initialization strategy for the PS embedding, default is Uniform.

  • embedding_type (str) – The embedding type, configurable parameters ["PS", "data_parallel"], "PS" means initializing PS embedding, "data_parallel" means initializing data_parallel embedding, and default is "PS".

  • ev_option (EmbeddingVariableOption) – Properties of the PS embedding, is a EmbeddingVariableOption obj which returned by embedding_variable_option function. Default is None.

  • multihot_lens (int) – The param only use when allow_merge is enabled, and not support now. Default is None.

  • optimizer (str) – The type of optimizer in the train mode for PS embedding, cannot be shared among each PS embedding, and currently only "Adam", "Ftrl", "SGD" and "RMSProp" are supported, and default is None.

  • allow_merge (bool) – Whether to enable merge data_parallel embeddings, currently only be False, and default is False.

  • optimizer_param (float) – The "initialize accumulator value" param of optimizer which configured by user, representing the init value of moment accumulator, and default is None.

  • mode (str) – Run mode, configurable parameters ["train", "predict", "export"], "train" means train mode, "predict" means predict mode, "export" mean export mode, and default is "train".

Returns

  • data_parallel embedding - a dict that contain data_parallel embedding information.

  • PS embedding - EmbeddingServiceOut, the embedding init object that contains PS embedding information, which contain five parameters: table_id_dict, es_initializer, es_counter_filter, es_padding_keys, es_completion_keys.

    • table_id_dict (dict): key is PS embedding and value is table_id.

    • es_initializer (dict): key is table_id and value is EsInitializer obj which means PS embedding parameters.

    • es_counter_filter (dict): key is table_id and value is filter option.

    • es_padding_keys (dict): key is table_id and value is padding key.

    • es_completion_keys (dict): key is table_id amd value is completion key.

Raises
  • ValueError – If "name", "init_vocabulary_size", "embedding_dim", "max_feature_count" are not set.

  • ValueError – If the types of "name", "init_vocabulary_size", "embedding_dim", and "max_feature_count" do not match.

  • ValueError – If the value of "init_vocabulary_size", "embedding_dim" and "max_feature_count" is less than or equal to 0, or the value of "init_vocabulary_size" is bigger than 2147483647.

  • ValueError – If the number of PS embedding exceeds 1024.

  • ValueError – If the value of "optimizer" not in ["adam", "adagrad", "adamw", "ftrl", "sgd", "rmsprop"].

  • TypeError – If the type of "initializer" is not EsInitializer obj or not in ["TruncatedNormal", "Uniform", "Constant"].

embedding_table_export(file_path, trainable_var)[source]

Export Embedding table for each PS embedding and data_parallel embedding.

Note

This function can only be executed by rank 0.

Parameters
  • file_path (str) – The path to export embedding table, and the last character cannot be "/".

  • trainable_var (list[parameter]) – The list of data_parallel embedding parameter.

Returns

The output of EmbeddingTableExport operator and data_parallel embedding export result.

embedding_table_import(file_path)[source]

Import embedding file from file path.

Parameters

file_path (str) – The path to import embedding table, and the last character cannot be "/".

Returns

The output of EmbeddingTableImport operator and data_parallel embedding import result.

embedding_variable_option(filter_option=None, padding_option=None, evict_option=None, completion_option=None, storage_option=None, feature_freezing_option=None, communication_option=None)[source]

Set variable option for PS embedding.

Parameters
  • filter_option (CounterFilter) – The option of counter filter. Default is None.

  • padding_option (PaddingParamsOption) – The option of padding key. Default is None.

  • evict_option (EvictOption) – The option evict. Default is None.

  • completion_option (CompletionKeyOption) – The option of completion key. Default is None.

  • storage_option (None) – Reserved option, currently not supported. Default is None.

  • feature_freezing_option (None) – Reserved option, currently not supported. Default is None.

  • communication_option (None) – Reserved option, currently not supported. Default is None.

Returns

EmbeddingVariableOption object, used as the ev_option parameter for embedding_init.

Raises
  • TypeError – If value of "filter_option" is not None and the type of "filter_option" is not CounterFilter.

  • TypeError – If value of "padding_option" is not None and the type of "padding_option" is not PaddingParamsOption.

  • TypeError – If value of "completion_option" is not None and the type of "completion_option" is not CompletionKeyOption.

  • TypeError – If value of "evict_option" is not None and the type of "evict_option" is not EvictOption.

evict_option(steps_to_live)[source]

Set evict option for each PS embedding.

Parameters

steps_to_live (int) – The steps set for evict key.

Returns

EvictOption object.

Raises
  • TypeError – If the type of "steps_to_live" is not int.

  • ValueError – If the value of "steps_to_live" is not greater than 0.

incremental_embedding_table_export(file_path)[source]

Incremental export embedding table for each PS embedding.

Note

This function can only be executed by rank 0.

Parameters

file_path (str) – The path to incremental export embedding table, and the last character cannot be "/".

Returns

The output of EmbeddingTableExport op.

init_table()[source]

Init table for data_parallel embedding.

Returns

A dict of data_parallel embedding parameter, that key is data_parallel embedding name, value is data_parallel embedding parameter.

padding_param(padding_key, mask=True, mask_zero=False)[source]

Init padding key option for each PS embedding.

Parameters
  • padding_key (int) – The value for padding key, must be a genuine and legal hash key.

  • mask (bool) – Whether to update padding key. If set to false, it will not be updated. Default is True.

  • mask_zero (bool) – Whether to update padding key when key is 0. Default is False.

Returns

PaddingParamsOption object.

Raises
  • TypeError – If the type of "padding_key" is not int.

  • TypeError – If the type of "mask" is not bool.