mindformers.core.CrossEntropyLoss๏
- class mindformers.core.CrossEntropyLoss(parallel_config=default_dpmp_config, check_for_nan_in_loss_and_grad=False, calculate_per_token_loss=False, seq_split_num=1, **kwargs)[source]๏
Calculate the cross entropy loss.
CrossEntropyLoss supports two different types of targets:
Class indices (int), where the range of values is
with being the number of classes. When reduction is set to 'none', the cross-entropy loss is computed as follows:where
denotes the predicted values, denotes the target values, denotes the weights, and is the batch size. The index ranges from [0, C-1], representing the class indices, where is the number of classes.If reduction is not set to 'none' (the default is 'mean'), the loss is computed as:
Class probabilities (float), used when the target is a probability distribution over multiple class labels. When reduction is set to 'none', the cross-entropy loss is computed as follows:
where
denotes the predicted values, denotes the target values, denotes the weights, and is the batch size. The index ranges from [0, C-1], representing the class indices, where is the number of classes.If reduction is not set to 'none' (the default is 'mean'), the loss is computed as:
- Parameters
parallel_config (mindformers.modules.OpParallelConfig, optional) โ The parallel configuration. Default:
default_dpmp_config
.check_for_nan_in_loss_and_grad (bool, optional) โ Whether to print local loss. Default:
False
.calculate_per_token_loss (bool, optional) โ Whether to use Megatron loss. Default:
False
.seq_split_num (int, optional) โ Sequence split number in sequence pipeline parallel mode. Default:
1
.
- Inputs:
logits (Tensor) - Tensor of shape (N, C). Data type must be float16 or float32. The output logits of the backbone.
label (Tensor) - Tensor of shape (N, ). The ground truth label of the sample.
input_mask (Tensor) - Tensor of shape (N, ). input_mask indicates whether there are padded inputs and for padded inputs it will not be counted into loss.
- Returns
Tensor, the computed cross entropy loss value.
Examples
>>> import numpy as np >>> from mindspore import dtype as mstype >>> from mindspore import Tensor >>> from mindformers.core import CrossEntropyLoss >>> loss = CrossEntropyLoss() >>> logits = Tensor(np.array([[3, 5, 6, 9, 12, 33, 42, 12, 32, 72]]), mstype.float32) >>> labels_np = np.array([1]).astype(np.int32) >>> input_mask = Tensor(np.ones(1).astype(np.float32)) >>> labels = Tensor(labels_np) >>> output = loss(logits, labels, input_mask) >>> output.shape (1,)