mindspore.experimental.es.EmbeddingService
- class mindspore.experimental.es.EmbeddingService[source]
Currently, ES(EmbeddingService) feature can only create one object which can support model training and inference for PS embedding and data_parallel embedding, and provide unified embedding management, storage, and computing capabilities for training and inference. PS embedding refer to tables that vocab_size more than 100,000, and recommended to store them on the Parameter Server (PS). Data_parallel embedding refer to tables that vocab_size less than 100,000, and recommended to store them on device.
Warning
This is an experimental EmbeddingService API that is subject to change.
Note
This API needs to call 'mindspore.communication.init()' before, and it can take effect after the dynamic networking is completed.
- Raises
ValueError – If the ESCLUSTER_CONFIG_PATH environment variable is not set during object instantiation.
ValueError – If the number of each server ParameterServer configured in ESCLUSTER_CONFIG_PATH configuration file exceeds four.
ValueError – If the number of ParameterServer configured in ESCLUSTER_CONFIG_PATH configuration file exceeds four.
- Supported Platforms:
Atlas A2 training series products
- completion_key(completion_key, mask=True)[source]
Init completion key option for each PS embedding.
- Parameters
- Returns
CompletionKeyOption object.
- Raises
- counter_filter(filter_freq, default_key=None, default_value=None)[source]
Init counter filter option for each PS embedding.
Note
This feature only supports training mode. When user set counter filter option in train mode and then eval, the default value can be used for the key that cannot be look up in eval.
- Parameters
filter_freq (int) – The frequency threshold value for feature admission.
default_key (int) – The key that number of occurrences does not reach the threshold, return value of default key as the corresponding value when look up embedding, and default is
None
.default_value (int/float) – The key that number of occurrences does not reach the threshold, return default value which length value is embedding dim, and default is
None
.
- Returns
CounterFilter object.
- Raises
TypeError – If the type of "filter_freq" is not int.
ValueError – If the value of "filter_freq" is less than 0.
ValueError – If the values of "default_key" and "default_value" are None.
ValueError – If neither of the values of "default_key" and "default_value" are None.
TypeError – If the value of "default_key" is None and the type of "default_value" is neither int nor float.
TypeError – If the value of "default_value" is None and the type of "default_key" is not int.
- embedding_ckpt_export(file_path, trainable_var)[source]
Export the embedding table and optimizer parameters of each PS embedding, and export embedding of a data_parallel embedding.
Note
This function can only be executed by rank 0. Need to call embedding_variable_option to set evict_option for each PS embedding before export.
- embedding_ckpt_import(file_path)[source]
Import embedding and ckpt file from file path.
- Parameters
file_path (str) – The path to import embedding and ckpt, and the last character cannot be
"/"
.- Returns
The output of EmbeddingComputeVarImport operator and data_parallel embedding import result.
- embedding_evict(steps_to_live)[source]
Embedding evict for all PS embedding.
- Parameters
steps_to_live (int) – The steps set for evict key.
- Returns
The output of ESEmbeddingTableEvict op.
- Raises
TypeError – If the type of "steps_to_live" is not int.
ValueError – If the value of "steps_to_live" is not greater than 0.
- embedding_init(name, init_vocabulary_size, embedding_dim, max_feature_count=None, initializer=Uniform(scale=0.01), embedding_type='PS', ev_option=None, multihot_lens=None, optimizer=None, allow_merge=False, optimizer_param=None, mode='train')[source]
Init for PS embedding and data_parallel embedding.
- Parameters
name (str) – The embedding table name.
init_vocabulary_size (int) – The size of embedding table.
embedding_dim (int) – The embedding dim of data in embedding table.
max_feature_count (int) – The count of keys when look up for PS.
initializer (Initializer) – The initialization strategy for the PS embedding, default is
Uniform
.embedding_type (str) – The embedding type, configurable parameters ["PS", "data_parallel"],
"PS"
means initializing PS embedding,"data_parallel"
means initializing data_parallel embedding, and default is"PS"
.ev_option (EmbeddingVariableOption) – Properties of the PS embedding, is a EmbeddingVariableOption obj which returned by embedding_variable_option function. Default is
None
.multihot_lens (int) – The param only use when allow_merge is enabled, and not support now. Default is
None
.optimizer (str) – The type of optimizer in the train mode for PS embedding, cannot be shared among each PS embedding, and currently only
"Adam"
,"Ftrl"
,"SGD"
and"RMSProp"
are supported, and default isNone
.allow_merge (bool) – Whether to enable merge data_parallel embeddings, currently only be False, and default is
False
.optimizer_param (float) – The "initialize accumulator value" param of optimizer which configured by user, representing the init value of moment accumulator, and default is
None
.mode (str) – Run mode, configurable parameters ["train", "predict", "export"],
"train"
means train mode,"predict"
means predict mode,"export"
mean export mode, and default is"train"
.
- Returns
data_parallel embedding - a dict that contain data_parallel embedding information.
PS embedding - EmbeddingServiceOut, the embedding init object that contains PS embedding information, which contain five parameters: table_id_dict, es_initializer, es_counter_filter, es_padding_keys, es_completion_keys.
table_id_dict (dict): key is PS embedding and value is table_id.
es_initializer (dict): key is table_id and value is EsInitializer obj which means PS embedding parameters.
es_counter_filter (dict): key is table_id and value is filter option.
es_padding_keys (dict): key is table_id and value is padding key.
es_completion_keys (dict): key is table_id amd value is completion key.
- Raises
ValueError – If "name", "init_vocabulary_size", "embedding_dim", "max_feature_count" are not set.
ValueError – If the types of "name", "init_vocabulary_size", "embedding_dim", and "max_feature_count" do not match.
ValueError – If the value of "init_vocabulary_size", "embedding_dim" and "max_feature_count" is less than or equal to 0, or the value of "init_vocabulary_size" is bigger than 2147483647.
ValueError – If the number of PS embedding exceeds 1024.
ValueError – If the value of "optimizer" not in ["adam", "adagrad", "adamw", "ftrl", "sgd", "rmsprop"].
TypeError – If the type of "initializer" is not EsInitializer obj or not in ["TruncatedNormal", "Uniform", "Constant"].
- embedding_table_export(file_path, trainable_var)[source]
Export Embedding table for each PS embedding and data_parallel embedding.
Note
This function can only be executed by rank 0.
- embedding_table_import(file_path)[source]
Import embedding file from file path.
- Parameters
file_path (str) – The path to import embedding table, and the last character cannot be
"/"
.- Returns
The output of EmbeddingTableImport operator and data_parallel embedding import result.
- embedding_variable_option(filter_option=None, padding_option=None, evict_option=None, completion_option=None, storage_option=None, feature_freezing_option=None, communication_option=None)[source]
Set variable option for PS embedding.
- Parameters
filter_option (CounterFilter) – The option of counter filter. Default is
None
.padding_option (PaddingParamsOption) – The option of padding key. Default is
None
.evict_option (EvictOption) – The option evict. Default is
None
.completion_option (CompletionKeyOption) – The option of completion key. Default is
None
.storage_option (None) – Reserved option, currently not supported. Default is
None
.feature_freezing_option (None) – Reserved option, currently not supported. Default is
None
.communication_option (None) – Reserved option, currently not supported. Default is
None
.
- Returns
EmbeddingVariableOption object, used as the ev_option parameter for embedding_init.
- Raises
TypeError – If value of "filter_option" is not None and the type of "filter_option" is not CounterFilter.
TypeError – If value of "padding_option" is not None and the type of "padding_option" is not PaddingParamsOption.
TypeError – If value of "completion_option" is not None and the type of "completion_option" is not CompletionKeyOption.
TypeError – If value of "evict_option" is not None and the type of "evict_option" is not EvictOption.
- evict_option(steps_to_live)[source]
Set evict option for each PS embedding.
- Parameters
steps_to_live (int) – The steps set for evict key.
- Returns
EvictOption object.
- Raises
TypeError – If the type of "steps_to_live" is not int.
ValueError – If the value of "steps_to_live" is not greater than 0.
- incremental_embedding_table_export(file_path)[source]
Incremental export embedding table for each PS embedding.
Note
This function can only be executed by rank 0.
- Parameters
file_path (str) – The path to incremental export embedding table, and the last character cannot be
"/"
.- Returns
The output of EmbeddingTableExport op.
- init_table()[source]
Init table for data_parallel embedding.
- Returns
A dict of data_parallel embedding parameter, that key is data_parallel embedding name, value is data_parallel embedding parameter.