mindspore.train.CheckpointConfig
- class mindspore.train.CheckpointConfig(save_checkpoint_steps=1, save_checkpoint_seconds=0, keep_checkpoint_max=5, keep_checkpoint_per_n_minutes=0, integrated_save=True, async_save=False, saved_network=None, append_info=None, enc_key=None, enc_mode='AES-GCM', exception_save=False, crc_check=False, remove_redundancy=False, format='ckpt', **kwargs)[source]
The configuration of model checkpoint.
Note
During the training process, if dataset is transmitted through the data channel, it is suggested to set 'save_checkpoint_steps' to an integer multiple of loop_size. Otherwise, the time to save the checkpoint may be biased. It is recommended to set only one save strategy and one keep strategy at the same time. If both save_checkpoint_steps and save_checkpoint_seconds are set, save_checkpoint_seconds will be invalid. If both keep_checkpoint_max and keep_checkpoint_per_n_minutes are set, keep_checkpoint_per_n_minutes will be invalid.
The enc_mode and crc_check parameters are mutually exclusive and cannot be configured simultaneously.
- Parameters
save_checkpoint_steps (int) – Steps to save checkpoint. Default:
1
.save_checkpoint_seconds (int) – Seconds to save checkpoint. Can't be used with save_checkpoint_steps at the same time. Default:
0
.keep_checkpoint_max (int) – Maximum number of checkpoint files can be saved. Default:
5
.keep_checkpoint_per_n_minutes (int) – Save the checkpoint file every keep_checkpoint_per_n_minutes minutes. Can't be used with keep_checkpoint_max at the same time. Default:
0
.integrated_save (bool) – Whether to merge and save the split Tensor in the automatic parallel scenario. Integrated save function is only supported in automatic parallel scene, not supported in manual parallel. Default:
True
.async_save (bool) – Whether asynchronous execution saves the checkpoint to a file. Default:
False
.saved_network (Cell) – Network to be saved in checkpoint file. If the saved_network has no relation with the network in training, the initial value of saved_network will be saved. Default:
None
.append_info (list) – The information save to checkpoint file. Support "epoch_num", "step_num" and dict. The key of dict must be str, the value of dict must be one of int, float, bool, Parameter or Tensor. Default:
None
.enc_key (Union[None, bytes]) – Byte type key used for encryption. If the value is None, the encryption is not required. Default:
None
.enc_mode (str) – This parameter is valid only when enc_key is not set to None. Specifies the encryption mode, currently supports 'AES-GCM', 'AES-CBC' and 'SM4-CBC'. Default:
'AES-GCM'
.exception_save (bool) – Whether to save the current checkpoint when an exception occurs. Default:
False
.crc_check (bool) – Whether to perform crc32 calculation when saving checkpoint and save the calculation result to the end of ckpt. Default:
False
.remove_redundancy (bool) – Whether to enable saving the checkpoint with redundancy removal. Redundancy removal refers to eliminating redundant data in data parallelism mode. Default:
False
, means redundant-free saving is not enabled.format (str) – Format of the output file, can be "ckpt" or "safetensors". Default: "ckpt".
kwargs (dict) – Configuration options dictionary.
- Raises
ValueError – If input parameter is not the correct type.
Examples
>>> from mindspore import nn >>> from mindspore.train import Model, CheckpointConfig, ModelCheckpoint >>> >>> # Define the network structure of LeNet5. Refer to >>> # https://gitee.com/mindspore/docs/blob/master/docs/mindspore/code/lenet.py >>> net = LeNet5() >>> loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean') >>> optim = nn.Momentum(net.trainable_params(), 0.01, 0.9) >>> model = Model(net, loss_fn=loss, optimizer=optim) >>> # Create the dataset taking MNIST as an example. Refer to >>> # https://gitee.com/mindspore/docs/blob/master/docs/mindspore/code/mnist.py >>> dataset = create_dataset() >>> config = CheckpointConfig(save_checkpoint_seconds=100, keep_checkpoint_per_n_minutes=5, saved_network=net) >>> config.save_checkpoint_steps 1 >>> config.save_checkpoint_seconds >>> config.keep_checkpoint_max 5 >>> config.keep_checkpoint_per_n_minutes >>> config.integrated_save True >>> config.async_save False >>> config.saved_network >>> config.enc_key >>> config.enc_mode 'AES-GCM' >>> config.append_dict >>> config.get_checkpoint_policy >>> ckpoint_cb = ModelCheckpoint(prefix='LeNet5', directory='./checkpoint', config=config) >>> model.train(10, dataset, callbacks=ckpoint_cb)
- property append_dict
Get the value of information dict saved to checkpoint file.
- Returns
dict, the information saved to checkpoint file.
- property async_save
Get the value of whether asynchronous execution saves the checkpoint to a file.
- Returns
bool, whether asynchronous execution saves the checkpoint to a file.
- property crc_check
Get the value of the whether to enable crc check.
- Returns
bool, whether to enable crc check.
- property enc_key
Get the value of byte type key used for encryption.
- Returns
(None, bytes), byte type key used for encryption.
- property enc_mode
Get the value of the encryption mode.
- Returns
str, encryption mode.
- get_checkpoint_policy()[source]
Get the policy of checkpoint.
- Returns
dict, the information of checkpoint policy.
- property integrated_save
Get the value of whether to merge and save the split Tensor in the automatic parallel scenario.
- Returns
bool, whether to merge and save the split Tensor in the automatic parallel scenario.
- property keep_checkpoint_max
Get the value of maximum number of checkpoint files can be saved.
- Returns
int, Maximum number of checkpoint files can be saved.
- property keep_checkpoint_per_n_minutes
Get the value of save the checkpoint file every n minutes.
- Returns
Int, save the checkpoint file every n minutes.
- property map_param_inc
Get the value of whether to save map Parameter incrementally.
- Returns
bool, whether to save map Parameter incrementally.
- property save_checkpoint_seconds
Get the value of _save_checkpoint_seconds.
- Returns
int, seconds to save the checkpoint file.
- property save_checkpoint_steps
Get the value of steps to save checkpoint.
- Returns
int, steps to save checkpoint.
- property saved_network
Get the value of network to be saved in checkpoint file.
- Returns
Cell, network to be saved in checkpoint file.