mindformers.core.CheckpointMonitor
- class mindformers.core.CheckpointMonitor(prefix='CKP', directory=None, config=None, save_checkpoint_steps=1, save_checkpoint_seconds=0, keep_checkpoint_max=5, keep_checkpoint_per_n_minutes=0, integrated_save=True, save_network_params=True, save_trainable_params=False, async_save=False, saved_network=None, append_info=None, enc_key=None, enc_mode='AES-GCM', exception_save=False, global_batch_size=None)[source]
Checkpoint Monitor For Save LossScale.
- Parameters
prefix (str, optional) – The prefix name of checkpoint files. Default:
'CKP'
.directory (str, optional) – The path of the folder which will be saved in the checkpoint file. Default:
None
.config (CheckpointConfig, optional) – Checkpoint strategy configuration. Default:
None
.save_checkpoint_steps (int, optional) – Steps to save checkpoint. Default:
1
.save_checkpoint_seconds (int, optional) – Seconds to save checkpoint. Can't be used with save_checkpoint_steps at the same time. Default:
0
.keep_checkpoint_max (int, optional) – Maximum number of checkpoint files can be saved. Default:
5
.keep_checkpoint_per_n_minutes (int, optional) – Save the checkpoint file every "keep_checkpoint_per_n_minutes" minutes. Can't be used with keep_checkpoint_max at the same time. Default:
0
.integrated_save (bool, optional) – Whether to merge and save the split Tensor in the automatic parallel scenario. Integrated save function is only supported in automatic parallel scene. Default:
True
.save_network_params (bool, optional) – Whether to only save network weights additionally. Default:
True
.save_trainable_params (bool, optional) – Whether to save fine-tuned weights additionally. Default:
False
.async_save (bool, optional) – Whether asynchronous execution saves the checkpoint to a file. Default:
False
.saved_network (Cell, optional) – Network to be saved in checkpoint file. Default:
None
.append_info (list, optional) – The information save to checkpoint file. Support "epoch_num", "step_num" and dict. Default:
None
.enc_key (Union[None, bytes], optional) – Byte type key used for encryption. Default:
None
.enc_mode (str, optional) – This parameter is valid only when "enc_key" is not set to None. Specifies the encryption mode, currently supports 'AES-GCM', 'AES-CBC' and 'SM4-CBC'. Default:
'AES-GCM'
.exception_save (bool, optional) – Whether to save the current checkpoint when an exception occurs. Default:
False
.global_batch_size (int, optional) – The total batch size. Default:
0
.
- Raises
ValueError – If prefix is not str or contains the '/' character.
ValueError – If directory is not str.
TypeError – If the config is not CheckpointConfig type.
Examples
>>> from mindformers.core import CheckpointMonitor >>> monitor = CheckpointMonitor(directory='./checkpoint_dir')