mindformers.core.CheckpointMonitor

View Source On Gitee
class mindformers.core.CheckpointMonitor(prefix='CKP', directory=None, config=None, save_checkpoint_steps=1, save_checkpoint_seconds=0, keep_checkpoint_max=5, keep_checkpoint_per_n_minutes=0, integrated_save=True, save_network_params=True, save_trainable_params=False, async_save=False, saved_network=None, append_info=None, enc_key=None, enc_mode='AES-GCM', exception_save=False, global_batch_size=None, checkpoint_format='ckpt', remove_redundancy=False)[source]

Checkpoint Monitor For Save LossScale.

Parameters
  • prefix (str) – The prefix name of checkpoint files. Default: 'CKP'.

  • directory (str) – The path of the folder which will be saved in the checkpoint file. Default: None.

  • config (CheckpointConfig) – Checkpoint strategy configuration. Default: None.

  • save_checkpoint_steps (int) – Steps to save checkpoint. Default: 1.

  • save_checkpoint_seconds (int) – Seconds to save checkpoint. Can't be used with save_checkpoint_steps at the same time. Default: 0.

  • keep_checkpoint_max (int) – Maximum number of checkpoint files can be saved. Default: 5.

  • keep_checkpoint_per_n_minutes (int) – Save the checkpoint file every "keep_checkpoint_per_n_minutes" minutes. Can't be used with keep_checkpoint_max at the same time. Default: 0.

  • integrated_save (bool) – Whether to merge and save the split Tensor in the automatic parallel scenario. Integrated save function is only supported in automatic parallel scene. Default: True.

  • save_network_params (bool) – Whether to only save network weights additionally. Default: True.

  • save_trainable_params (bool) – Whether to save fine-tuned weights additionally. Default: False.

  • async_save (bool) – Whether asynchronous execution saves the checkpoint to a file. Default: False.

  • saved_network (Cell) – Network to be saved in checkpoint file. Default: None.

  • append_info (list) – The information save to checkpoint file. Support "epoch_num", "step_num" and dict. Default: None.

  • enc_key (Union[None, bytes]) – Byte type key used for encryption. Default: None.

  • enc_mode (str) – This parameter is valid only when "enc_key" is not set to None. Specifies the encryption mode, currently supports 'AES-GCM', 'AES-CBC' and 'SM4-CBC'. Default: 'AES-GCM'.

  • exception_save (bool) – Whether to save the current checkpoint when an exception occurs. Default: False.

  • global_batch_size (int) – The total batch size. Default: 0.

  • checkpoint_format (str) – The format of checkpoint to save. Default: 'ckpt'.

  • remove_redundancy (bool) – Whether to remove redundancy when saving checkpoint. Default: False.

Raises
  • ValueError – If prefix is not str or contains the '/' character.

  • ValueError – If directory is not str.

  • TypeError – If the config is not CheckpointConfig type.

Examples

>>> from mindformers.core import CheckpointMonitor
>>> monitor = CheckpointMonitor(directory='./checkpoint_dir')