mindspore.train.OnRequestExit

class mindspore.train.OnRequestExit(save_ckpt=True, save_mindir=True, file_name='Net', directory='./', config_file=None, sig=signal.SIGTERM)[source]

Respond to the user's closing request, exit the training or eval process, and save the checkpoint and mindir.

Register OnRequestExit Callback before training, when the user want to exit the training process and save the training data, could send the registered exit signal 'sig' to the training process or modify the 'GracefulExit' that a key in the json file specified by the 'config_file' to '1'. After the training process executes the current step, saves the current training status, including checkpoint and mindir, and then exit the training process.

Parameters
  • save_ckpt (bool) – Whether save the checkpoint before the training process exit. Default: True .

  • save_mindir (bool) – Whether save the mindir before the training process exit. Default: True .

  • file_name (str) – The saved checkpoint and mindir file name, the checkpoint file add suffix '.ckpt', the mindir file add suffix '.mindir'. Default: 'Net' .

  • directory (str) – The path to save files. It will generate a 'rank_{id}' path by rank_id to save checkpoint and mindir. Default: './' .

  • sig (int) – The user registered exit signal, it must be a captureable and negligible signal. When the process receives the signal, exits the training or eval process. Default: signal.SIGTERM .

  • config_file (str) – A json config file used to exit training process gracefully. Key: {"GracefulExit": 1} . Default: None .

Raises
  • ValueError – If the 'save_ckpt' is not a bool.

  • ValueError – If the 'save_mindir' is not a bool.

  • ValueError – If the 'file_name' is not a str.

  • ValueError – If the 'directory' is not a str.

  • ValueError – If the 'sig' is not an int or the 'sig' is signal.SIGKILL.

Examples

>>> from mindspore import nn
>>> from mindspore.train import Model, TimeMonitor
>>> import mindspore as ms
>>>
>>> # Define the network structure of LeNet5. Refer to
>>> # https://gitee.com/mindspore/docs/blob/master/docs/mindspore/code/lenet.py
>>> net = LeNet5()
>>> loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
>>> optim = nn.Momentum(net.trainable_params(), 0.01, 0.9)
>>> model = Model(net, loss_fn=loss, optimizer=optim)
>>> # Create the dataset taking MNIST as an example. Refer to
>>> # https://gitee.com/mindspore/docs/blob/master/docs/mindspore/code/mnist.py
>>> dataset = create_dataset()
>>> on_request_exit = ms.train.OnRequestExit(file_name='LeNet5')
>>> model.train(10, dataset, callbacks=on_request_exit)
on_eval_begin(run_context)[source]

When the eval begin, register the handler for exit signal transferred by user.

Parameters

run_context (RunContext) – Context information of the model. For more details, please refer to mindspore.train.RunContext.

on_eval_end(run_context)[source]

When the eval end, if received the exit signal, the checkpoint and mindir would be saved according to the user config.

Parameters

run_context (RunContext) – Include some information of the model. For more details, please refer to mindspore.train.RunContext.

on_eval_step_end(run_context)[source]

When the eval step end, if received the exit signal, set attribute '_stop_requested' of the 'run_context' to True. Then exit the eval process after this step eval.

Parameters

run_context (RunContext) – Include some information of the model. For more details, please refer to mindspore.train.RunContext.

on_train_begin(run_context)[source]

When the train begin, register the handler for exit signal transferred by user.

Parameters

run_context (RunContext) – Context information of the model. For more details, please refer to mindspore.train.RunContext.

on_train_end(run_context)[source]

When the train end, if received the exit signal, the checkpoint and mindir would be saved according to the user config.

Parameters

run_context (RunContext) – Include some information of the model. For more details, please refer to mindspore.train.RunContext.

on_train_epoch_end(run_context)[source]

When the train epoch end, if received the exit signal, set the 'run_context' attribute '_stop_requested' to True. Then exit the training process after this epoch training.

Parameters

run_context (RunContext) – Include some information of the model. For more details, please refer to mindspore.train.RunContext.

on_train_step_begin(run_context)[source]

Check whether received the exit signal or whether the value of 'GracefulExit' in 'config_file' was changed to '1'.

Parameters

run_context (RunContext) – Context information of the model. For more details, please refer to mindspore.train.RunContext.

on_train_step_end(run_context)[source]

Save checkpoint file or mindir file according to config, and exit the training process.

Parameters

run_context (RunContext) – Include some information of the model. For more details, please refer to mindspore.train.RunContext.