mindspore.boost (experimental)

Boost provide auto accelerating for network, such as Less BN, Gradient Freeze, Gradient accumulation and so on.

Note

This feature is a beta feature, and we are still improving its functionality.

class mindspore.boost.AdaSum(rank, device_number, group_number, parameter_tuple)[source]

The Adaptive Summation, or AdaSum, is a novel algorithm for improving distributed data parallel training of Deep Learning models.

Parameters

network (Cell) – The training network. The network only supports single output.
optimizer (Union[Cell]) – Optimizer for updating the weights.
sens (numbers.Number) – The scaling number to be filled as the input of backpropagation. Default value is 1.0.

Inputs:

delta_weights (Tuple(Tensor)) - Tuple of gradients.
parameters (Tuple(Parameter)) - Tuple of current parameters.
old_parameters (Tuple(Parameter)) - Tuple of last parameters.

Outputs:

adasum_parameters (Tuple(Tensor)) - Tuple of parameters after adasum process.

class mindspore.boost.AutoBoost(level, kwargs)[source]

Provide auto accelerating for network.

Parameters

level (str) – boost config level.
kwargs (any) – Additional configuration parameters related to boost.

network_auto_process_eval(network)[source]: Network eval.

network_auto_process_train(network, optimizer)[source]: Network train.

class mindspore.boost.BoostTrainOneStepCell(network, optimizer, sens=1.0)[source]

Boost Network training package class.

Wraps the network with an optimizer. The resulting Cell is trained with input ‘*inputs’. The backward graph will be created in the construct function to update the parameter. Different parallel modes are available for training.

Parameters

network (Cell) – The training network. The network only supports single output.
optimizer (Union[Cell]) – Optimizer for updating the weights.
sens (numbers.Number) – The scaling number to be filled as the input of backpropagation. Default value is 1.0.

Inputs:

(*inputs) (Tuple(Tensor)) - Tuple of input tensors with shape $(N, \dots)$ .

Outputs:

Tensor, a tensor means the loss value, the shape of which is usually $()$ .

Raises: TypeError – If sens is not a number.

Supported Platforms:: Ascend GPU CPU

Examples

>>> from mindspore import boost
>>> net = Net()
>>> loss_fn = nn.SoftmaxCrossEntropyWithLogits()
>>> optim = nn.Momentum(net.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> #1) Using the WithLossCell existing provide
>>> loss_net = nn.WithLossCell(net, loss_fn)
>>> train_net = boost.BoostTrainOneStepCell(loss_net, optim)
>>>
>>> #2) Using user-defined WithLossCell
>>> class MyWithLossCell(Cell):
...    def __init__(self, backbone, loss_fn):
...        super(MyWithLossCell, self).__init__(auto_prefix=False)
...        self._backbone = backbone
...        self._loss_fn = loss_fn
...
...    def construct(self, x, y, label):
...        out = self._backbone(x, y)
...        return self._loss_fn(out, label)
...
...    @property
...    def backbone_network(self):
...        return self._backbone
...
>>> loss_net = MyWithLossCell(net, loss_fn)
>>> train_net = boost.BoostTrainOneStepCell(loss_net, optim)

adasum_process(loss, grads)[source]: adasum algorithm process.

check_adasum_enable(optimizer, reducer_flag)[source]: check adasum enable.

gradient_accumulation_process(loss, grads)[source]: gradient accumulation algorithm process.

gradient_freeze_process(*inputs)[source]: gradient freeze algorithm process.

class mindspore.boost.BoostTrainOneStepWithLossScaleCell(network, optimizer, scale_sense)[source]

Boost Network training with loss scaling.

This is a training step with loss scaling. It takes a network, an optimizer and possibly a scale update Cell as args. The loss scale value can be updated in both host side or device side. The BoostTrainOneStepWithLossScaleCell will be compiled to be graph which takes *inputs as input data. The Tensor type of scale_sense is acting as loss scaling value. If you want to update it on host side, the value must be provided. If the Tensor type of scale_sense is not given, the loss scale update logic must be provied by Cell type of scale_sense.

Parameters

network (Cell) – The training network. The network only supports single output.
optimizer (Cell) – Optimizer for updating the weights.
scale_sense (Union[Tensor, Cell]) – If this value is Cell type, the loss scaling update logic cell.If this value is Tensor type, Tensor with shape $()$ or $(1,)$ .

Inputs:

(*inputs) (Tuple(Tensor)) - Tuple of input tensors with shape $(N, \dots)$ .

Outputs:

Tuple of 3 Tensor, the loss, overflow flag and current loss scaling value.

loss (Tensor) - Tensor with shape $()$ .
overflow (Tensor) - Tensor with shape $()$ , type is bool.
loss scaling value (Tensor) - Tensor with shape $()$

Raises

TypeError – If scale_sense is neither Cell nor Tensor.
ValueError – If shape of scale_sense is neither (1,) nor ().

Supported Platforms:: Ascend GPU

Examples

>>> import numpy as np
>>> from mindspore import Tensor, Parameter, nn
>>> import mindspore.ops as ops
>>> from mindspore.nn import WithLossCell
>>> from mindspore import dtype as mstype
>>> from mindspore import boost
>>>
>>> class Net(nn.Cell):
...     def __init__(self, in_features, out_features):
...         super(Net, self).__init__()
...         self.weight = Parameter(Tensor(np.ones([in_features, out_features]).astype(np.float32)),
...                                 name='weight')
...         self.matmul = ops.MatMul()
...
...     def construct(self, x):
...         output = self.matmul(x, self.weight)
...         return output
...
>>> size, in_features, out_features = 16, 16, 10
>>> #1) when the type of scale_sense is Cell:
>>> net = Net(in_features, out_features)
>>> loss = nn.MSELoss()
>>> optimizer = nn.Momentum(net.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> net_with_loss = WithLossCell(net, loss)
>>> manager = nn.DynamicLossScaleUpdateCell(loss_scale_value=2**12, scale_factor=2, scale_window=1000)
>>> train_network = boost.BoostTrainOneStepWithLossScaleCell(net_with_loss, optimizer, scale_sense=manager)
>>> input = Tensor(np.ones([out_features, in_features]), mstype.float32)
>>> labels = Tensor(np.ones([out_features,]), mstype.float32)
>>> output = train_network(input, labels)
>>>
>>> #2) when the type of scale_sense is Tensor:
>>> net = Net(in_features, out_features)
>>> loss = nn.MSELoss()
>>> optimizer = nn.Momentum(net.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> net_with_loss = WithLossCell(net, loss)
>>> inputs = Tensor(np.ones([size, in_features]).astype(np.float32))
>>> label = Tensor(np.zeros([size, out_features]).astype(np.float32))
>>> scaling_sens = Tensor(np.full((1), np.finfo(np.float32).max), dtype=mstype.float32)
>>> train_network = boost.BoostTrainOneStepWithLossScaleCell(net_with_loss, optimizer, scale_sense=scaling_sens)
>>> output = train_network(inputs, label)

class mindspore.boost.FreezeOpt(opt, train_parameter_groups=None, train_strategy=None)[source]

Optimizer that supports gradients freezing training.

Parameters

opt (Cell) – non-freezing optimizer instance, such as ‘Momentum’, ‘SGD’.
train_parameter_groups (Union[tuple, list]) – Groups of parameters for gradients freezing training.
train_strategy (Union[tuple(int), list(int), Tensor]) – Strategy for gradients freezing training.

Supported Platforms:: Ascend

class mindspore.boost.GradientAccumulation(max_accumulation_step, optimizer)[source]

After accumulating the gradients of multiple steps, call to optimize its update.

Parameters

max_accumulation_step (int) – Steps to accumulate gradients.
optimizer (Cell) – Optimizer used.

class mindspore.boost.GradientFreeze(param_groups, freeze_type, freeze_p, total_steps)[source]

Freezing the gradients of some layers randomly. The number and probability of frozen layers can be configured by users

Parameters

param_groups (Union[tuple, list]) – Groups of parameters for gradients freezing training.
freeze_type (int) – Strategy of gradients freezing training.
freeze_p (float) – probability of gradients freezing training.
total_steps (numbers.Number) – Steps of the whole training.

Examples

>>> gradient_freeze_class = acc.GradientFreeze(10, 1, 0.5, 2000)
>>> network, optimizer = gradient_freeze_class.freeze_generate(network, optimizer)

freeze_generate(network, optimizer)[source]: Generate freeze network and optimizer.

generate_freeze_index_sequence(parameter_groups_number, freeze_strategy, freeze_p, total_steps)[source]: Generate index sequence for gradient freezing training.

split_parameters_groups(net, freeze_para_groups_number)[source]: Split parameter groups for gradients freezing training.

class mindspore.boost.LessBN(network, fn_flag=False)[source]

Reduce the number of BN automatically to improve the network performance and ensure the network accuracy.

Parameters

network (Cell) – Network to be modified.
fn_flag (bool) – Replace FC with FN. default: False.

Examples

>>> network = boost.LessBN(network)

class mindspore.boost.OptimizerProcess(opt)[source]

Process optimizer for Boost. Currently, this class supports adding GC(grad centralization) tags and creating new optimizers.

Parameters: opt (Cell) – Optimizer used.

Examples

>>> from mindspore import Tensor, Parameter, nn
>>> import mindspore.ops import ops
>>> from mindspore.boost import OptimizerProcess
>>>
>>> class Net(nn.Cell):
...     def __init__(self, in_features, out_features):
...         super(Net, self).__init__()
...         self.weight = Parameter(Tensor(np.ones([in_features, out_features]).astype(np.float32)),
...                                 name='weight')
...         self.matmul = ops.MatMul()
...
...     def construct(self, x):
...         output = self.matmul(x, self.weight)
...         return output
...
>>> size, in_features, out_features = 16, 16, 10
>>> network = Net(in_features, out_features)
>>> optimizer = nn.Momentum(net.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> optimizer_process = OptimizerProcess(optimizer)
>>> optimizer_process.add_grad_centralization(network)
>>> optimizer = optimizer_process.generate_new_optimizer()

add_grad_centralization(network)[source]: Add gradient centralization.

build_gc_params_group(params_dict, parameters)[source]: Build the params group that needs gc

build_params_dict(network)[source]: Build the params dict of the network

generate_new_optimizer()[source]: Generate new optimizer.

class mindspore.boost.ParameterProcess[source]

Process parameter for Boost. Currently, this class supports creating group parameters and automatically setting gradient segmentation point.

Examples

>>> from mindspore import Tensor, Parameter, nn
>>> import mindspore.ops as ops
>>> from mindspore.boost import OptimizerProcess
>>>
>>> class Net(nn.Cell):
...     def __init__(self, in_features, out_features):
...         super(Net, self).__init__()
...         self.weight = Parameter(Tensor(np.ones([in_features, out_features]).astype(np.float32)),
...                                 name='weight')
...         self.weight2 = Parameter(Tensor(np.ones([in_features, out_features]).astype(np.float32)),
...                                 name='weight2')
...         self.matmul = ops.MatMul()
...         self.matmul2 = ops.MatMul()
...
...     def construct(self, x):
...         output = self.matmul(x, self.weight)
...         output2 = self.matmul2(x, self.weight2)
...         return output + output2
...
>>> size, in_features, out_features = 16, 16, 10
>>> network = Net(in_features, out_features)
>>> new_parameter = net.trainable_params()[:1]
>>> parameter_process = ParameterProcess()
>>> group_params = parameter_process.generate_group_params(new_parameter, net.trainable_params())

assign_parameter_group(parameters, split_point=None)[source]: Assign parameter group.

generate_group_params(parameters, origin_params)[source]: Generate group parameters.

mindspore.boost.freeze_cell(reducer_flag, network, optimizer, sens, grad, use_grad_accumulation, mean=None, degree=None, max_accumulation_step=1)[source]: Provide freeze network cell.