mindspore.nn.PipelineGradReducer
- class mindspore.nn.PipelineGradReducer(parameters, scale_sense=1.0)[源代码]
用于流水线并行的梯度聚合。
- 参数:
parameters (list) - 需要更新的参数。
scale_sense (float) - 梯度的放缩系数,默认为
1.0
。
- 异常:
RuntimeError - 如果当前模式不是图模式。
RuntimeError - 如果并行模式不是半自动并行或者自动并行。
- 支持平台:
Ascend
GPU
样例:
说明
运行以下样例之前,需要配置好通信环境变量。
针对Ascend设备,用户需要准备rank表,设置rank_id和device_id,详见 rank table启动 。
针对GPU设备,用户需要准备host文件和mpi,详见 mpirun启动 。
针对CPU设备,用户需要编写动态组网启动脚本,详见 动态组网启动 。
该样例需要在多卡环境下运行。
>>> import numpy as np >>> import mindspore as ms >>> from mindspore import nn, ops, Tensor >>> from mindspore.communication import init >>> >>> ms.set_context(mode=ms.GRAPH_MODE) >>> ms.reset_auto_parallel_context() >>> ms.set_auto_parallel_context(parallel_mode=ms.ParallelMode.SEMI_AUTO_PARALLEL, pipeline_stages=2) >>> init() >>> ms.set_seed(1) >>> >>> class Network(nn.Cell): ... def __init__(self, in_features, out_features, sens=1.0): ... super().__init__() ... self.layer1 = nn.Dense(in_features, 16) ... self.relu1 = nn.ReLU() ... self.layer2 = nn.Dense(16, 16) ... self.relu2 = nn.ReLU() ... self.layer3 = nn.Dense(16, out_features) ... ... def construct(self, x): ... x = self.layer1(x) ... x = self.relu1(x) ... x = self.layer2(x) ... x = self.relu2(x) ... logits = self.layer3(x) ... return logits >>> >>> size, in_features, out_features = 16, 32, 10 >>> net = Network(in_features, out_features) >>> net.layer1.pipeline_stage = 0 >>> net.relu1.pipeline_stage = 0 >>> net.layer2.pipeline_stage = 0 >>> net.relu2.pipeline_stage = 1 >>> net.layer3.pipeline_stage = 1 >>> loss_fn = nn.CrossEntropyLoss() >>> optimizer = nn.SGD(net.trainable_params(), 1e-2) >>> net_with_loss = nn.PipelineCell(nn.WithLossCell(net, loss_fn), 2) >>> net_with_loss.set_train() >>> def forward_fn(inputs, target): ... loss = net_with_loss(inputs, target) ... return loss >>> >>> grad_fn = ops.value_and_grad(forward_fn, None, net_with_loss.trainable_params()) >>> pp_grad_reducer = nn.PipelineGradReducer(optimizer.parameters) >>> >>> @ms.jit >>> def train_one_step(inputs, target): ... loss, grads = grad_fn(inputs, target) ... grads = pp_grad_reducer(grads) ... optimizer(grads) ... return loss, grads >>> >>> inputs = Tensor(np.ones([size, in_features]).astype(np.float32)) >>> label = Tensor(np.ones([size, out_features]).astype(np.float32)) >>> loss, _ = train_one_step(inputs, label) >>> print(loss) 46.36721