mindformers.core.CheckpointMonitor
- class mindformers.core.CheckpointMonitor(prefix='CKP', directory=None, config=None, save_checkpoint_steps=1, save_checkpoint_seconds=0, keep_checkpoint_max=5, keep_checkpoint_per_n_minutes=0, integrated_save=True, save_network_params=True, save_trainable_params=False, async_save=False, saved_network=None, append_info=None, enc_key=None, enc_mode='AES-GCM', exception_save=False, global_batch_size=None)[source]
Checkpoint Monitor For Save LossScale.
- Parameters
prefix (str) – The prefix name of checkpoint files. Default: 'CKP'.
directory (str) – The path of the folder which will be saved in the checkpoint file. Default: None.
config (CheckpointConfig) – Checkpoint strategy configuration. Default: None.
save_checkpoint_steps (int) – Steps to save checkpoint. Default: 1.
save_checkpoint_seconds (int) – Seconds to save checkpoint. Can't be used with save_checkpoint_steps at the same time. Default: 0.
keep_checkpoint_max (int) – Maximum number of checkpoint files can be saved. Default: 5.
keep_checkpoint_per_n_minutes (int) – Save the checkpoint file every "keep_checkpoint_per_n_minutes" minutes. Can't be used with keep_checkpoint_max at the same time. Default: 0.
integrated_save (bool) – Whether to merge and save the split Tensor in the automatic parallel scenario. Integrated save function is only supported in automatic parallel scene. Default: True.
save_network_params (bool) – Whether to only save network weights additionally. Default: True.
save_trainable_params (bool) – Whether to save fine-tuned weights additionally. Default: False.
async_save (bool) – Whether asynchronous execution saves the checkpoint to a file. Default: False.
saved_network (Cell) – Network to be saved in checkpoint file. Default: None.
append_info (list) – The information save to checkpoint file. Support "epoch_num", "step_num" and dict. Default: None.
enc_key (Union[None, bytes]) – Byte type key used for encryption. Default: None.
enc_mode (str) – This parameter is valid only when "enc_key" is not set to None. Specifies the encryption mode, currently supports 'AES-GCM', 'AES-CBC' and 'SM4-CBC'. Default: 'AES-GCM'.
exception_save (bool) – Whether to save the current checkpoint when an exception occurs. Default: False.
global_batch_size (int) – The total batch size. Default: 0.
- Raises
ValueError – If prefix is not str or contains the '/' character.
ValueError – If directory is not str.
TypeError – If the config is not CheckpointConfig type.
Examples
>>> from mindformers.core import CheckpointMonitor >>> monitor = CheckpointMonitor(directory='./checkpoint_dir')