Use Mindoptimizer to Tune Hyperparameters
Overview
There are two kinds of parameters in machine learning. One is the model internal parameters, relying on training data and algorithms to tune the model parameters. And the other is the model external setting parameters, they need to be manually configured, such parameters are called hyperparameters. Because different hyperparameters impact the performance of model, hyperparameters are highly important in training tasks. Traditional methods require manual analysis of hyperparameters, manual debugging, and configuration, which consumes time and effort. MindInsight parameter tuning command can be used for automatic parameter tuning. Based on the parameter tuning configuration information provided by users, parameters can be automatically configured and model training can be performed.
MindInsight provides mindoptimizer
. This tuning command can extract past training summaries from the training log according to the user configuration, analyze past training records and recommend hyperpameters, and finally automate training scripts. When using it, users need to configure information such as the scope of hyperparameters in yaml format. And then users need to replace the hyperparameters in the training script according to the tutorial, with the aim of synchronizing the auto-recommended hyperparameters into the training script. Currently, only the Gauss process tuning method is supported, and other methods are gradually supported.
Installation
This tool is a submodule of MindInsight. After MindInsight is installed, you can use the MindInsight parameter tuning command. For details about how to install MindInsight, see the installation guide.
Usage
MindInsight provides parameters tuning command. The command-line interface (CLI) provides the following commands:
usage: mindoptimizer [-h] [--version] [--config CONFIG]
[--iter ITER]
optional arguments:
-h, --help Shows the help message and exits.
--version Shows the program version and exits.
--config CONFIG Specifies the configuration file for parameter tuning.
The file format is yaml.
--iter ITER Specifies the times of automatic training.
Automatically recommended parameters are used every time
before the training is performed.
The default value of ITER is 1.
Configuration File Rules
The file format of the configuration file is yaml, which requires configurations of running command, the root directory of training summaries, tuning method, optimization objectives, and hyperparameters information. It is necessary to configure the information about hyperparameters including the range of values, types and sources, etc. MindInsight extracts training records from the training summary based on configured hyperparameters and optimization objectives, such as learning and accuracy. They can be used by recommended algorithms to analyze their relationships and better recommend hyperparameters.
Configure the running command
Use
command
to configure running command, such ascommand: python train.py
. The running command is executed directly after the tuning program recommends hyperparameters.Configure the root directory of training summaries
The
summary_base_dir
is the root directory of training summaries. It is also used for the extraction of training records, which makes hyperparameters better recommended. At the same time, it is recommended that users addSummaryColletor
in their training scripts to collect training information, you can view the summary collection tutorial. The tuning command generates a subdirectory path based on the configuredsummary_base_dir
, which can be configured to record the training record atSummaryColletor
. Therefore, after training, the training information is recorded in the subdirecte of the root directory of training summaries, and the training information can be used as a training record to recommend the next required hyperparameters. Configure thesummary_base_dir
, such as/home/summaries
.Configure the parameter tuning method
Use
name
to specify the name of an acquisition function, andargs
tp specify parameters of the acquisition function.The current algorithm is Gaussian process regressor(GP). The acquisition functhon of GP is optional, and its range is in [
ucb
,pi
,ei
]. The default value isucb
.Upper confidence bound (UCB)
Probability of improvement (PI)
Expected improvement (EI)
For example:
tuner: name: gp args: method: ucb
Configure the parameter tuning target
You can select loss or self-defined metrics as the target.
Configuration description:
group: This parameter is optional. The value can be
system_defined
ormetric
. The default value issystem_defined
. Usegroup
to configure the group in which the optimization target is located, such as the system custom collection field, which is thesystem_defined
group. However, other evaluation metrics used inModel()
, such asmodel = Model(net, loss_fn=loss, optimizer=None, metrics={'Accuracy'})
.Accuracy
belongs to the metrics, so the group ismetric
.goal: This parameter is optional. The value can be
minimize
ormaximize
. The default value isminimize
. Usegoal
to indicate the optimization direction of the target. For example, ifAccuracy
is higher, the performance of model is better, so thegoal
needs to be configured as ‘maximize’.
Config
loss
:target: name:loss
Config
Accuracy
in metrics:target: group: metric name: Accuracy goal: maximize
Configure hyperparameters bounds, choice, type, and source
Configuration fields for hyperparameters consist of
bounds
,choices
,type
, andsource
. The fields of hyperparameters configured here are used for extraction of training summaries and recommendation of hyperparameters. In addition,bounds
,choice
, andtype
affect the recommendation of hyperparameters.bounds
are configured as the upper and lower boundaries of the hyperparameters,choice
indicates that which recommended values are selected, andtype
is the type of parameter configured.The tunable fields currently collected by the system customization include
learning_rate
,batch_size
andepoch
. Other parameters are user-defined parameters will be automatically collected in the training summary during training if the source is configured asuser_defined
.bounds: a list. The number of elements is 2. The first number is the lower bound min, and the second number is the upper bound max. The value range is [min, max). The method for generating a random number is
numpy.random.uniform()
.choice: a list. The number of values is not limited. Values are selected from the elements in the list.
type: This parameter is mandatory and should be set to
int
orfloat
.source: This parameter is optional. The value should be
system_defined
oruser_defined
. If the name of parameter exists in system-defined field, the default value issystem_defined
, otherwise, the default value isuser_defined
.
You need to choose either
bounds
orchoice
. If you have configuredchoice
, values are selected from the configured list only, and if you have configured bothchoice
andtype
,type
does not take effect.
Usage Examples
If you want to optimize the learning_rate
, batch_size
, and momentum
, and the optimization objective is Accuracy
, configure the YAML file as follows:
Configure config.yaml
command: sh /home/example/run_alexnet_ascend.sh summary_base_dir: /home/summaries tuner: name: gp target: group: metric name: Accuracy goal: maximize parameters: learning_rate: bounds: [0.00001, 0.001] type: float batch_size: choice: [32, 64, 128, 256] type: int momentum: source: user_defined choice: [0.8, 0.9] type: float
The name of
momentum
is not the same as that of the variable defined by the system. Therefore, you do not need to set the source field.If the fields with the same name exist in the YAML file, the last one will be selected. Do not use the following method.
parameters: learning_rate: bounds: [0.0005, 0.001] type: float learning_rate: source: user_defined bounds: [0.00002, 0.0001] type: float
Instantiate the
HyperConfig
object in the training script(1) After instantiating
HyperConfig
, use the parameter variables of theHyperConfig
instance as the values of the corresponding parameters in the training script.
(2) Please addSummaryCollector
to collect lineage information, including hyperparameters and evaluation metrics.For example, the training script in Model Zoo is as follows:
ds_train = create_dataset_cifar10(args.data_path, batch_size) lr = Tensor(get_lr_cifar10(0, cfg.learning_rate, cfg.epoch_size, step_per_epoch)) opt = nn.Momentum(network.trainable_params(), lr, cfg.momentum) model.train(cfg.epoch_size, ds_train, callbacks=[time_cb, ckpoint_cb, LossMonitor()]
After the modification:
from mindinsight.optimizer import HyperConfig config = HyperConfig() params = config.params # Replace batch_size with params.batch_size. ds_train = create_dataset_cifar10(args.data_path, params.batch_size) # Replace cfg.learning_rate with params.learning_rate. lr = Tensor(get_lr_cifar10(0, params.learning_rate, cfg.epoch_size, step_per_epoch)) # Replace cfg.momentum with params.momentum. opt = nn.Momentum(network.trainable_params(), lr, params.momentum) # Instantiate SummaryCollector and add it to callback to automatically collect training information. summary_cb = SummaryCollector(config.summary_dir) model.train(cfg.epoch_size, ds_train, callbacks=[time_cb, ckpoint_cb, LossMonitor(), summary_cb]
Execution
Please make sure that the training script can be executed correctly before performing automatic tuning.
mindoptimizer --config ./config.yaml --iter 10
Please fill in the training command to execute the training in the configuration file, and run the automatic tuning program in the directory where the training command can be successfully run.
Visualization
Enable MindInsight based on summary_base_dir configured in config.yaml. For details about the visualization method, see the MindInsight start tutorial.
Notices
The training script is written and maintained by users. This tool does not automatically modify the training script. If the training script is incorrect, an error occurs when this tool is used to support the training script.
This tool does not process or modify the printed information during the running process.
Ensure that the parameter tuning process is trustworthy. If a parameter configuration error or script execution error occurs, the parameter tuning process will be terminated. You can locate the fault based on the displayed information.