mindformers.core.CrossEntropyLoss

class mindformers.core.CrossEntropyLoss(parallel_config=default_dpmp_config, check_for_nan_in_loss_and_grad=False, monitor_device_local_loss=False, calculate_per_token_loss=False, seq_split_num=1, **kwargs)[source]

Calculate the cross entropy loss.

CrossEntropyLoss supports two different types of targets:

Class indices (int), where the range of values is $[0, C)$ with $C$ being the number of classes. When reduction is set to 'none', the cross-entropy loss is computed as follows:

$ℓ (x, y) = L = {l_{1}, \dots, l_{N}}^{⊤}, l_{n} = - w_{y_{n}} \log \frac{\exp (x_{n, y_{n}})}{\sum_{c = 1}^{C} \exp (x_{n, c})} \cdot 1 {y_{n} \neq ignore_index}$

where $x$ denotes the predicted values, $t$ denotes the target values, $w$ denotes the weights, and $N$ is the batch size. The index $c$ ranges from [0, C-1], representing the class indices, where $C$ is the number of classes.

If reduction is not set to 'none' (the default is 'mean'), the loss is computed as:

$\begin{array}{r} ℓ (x, y) = {\begin{cases} \sum_{n = 1}^{N} \frac{1}{\sum_{n = 1}^{N} w_{y_{n}} \cdot 1 {y_{n} \neq ignore_index}} l_{n}, & if reduction ='mean', \\ \sum_{n = 1}^{N} l_{n}, & if reduction ='sum'. \end{cases} \end{array}$
Class probabilities (float), used when the target is a probability distribution over multiple class labels. When reduction is set to 'none', the cross-entropy loss is computed as follows:

$ℓ (x, y) = L = {l_{1}, \dots, l_{N}}^{⊤}, l_{n} = - \sum_{c = 1}^{C} w_{c} \log \frac{\exp (x_{n, c})}{\sum_{i = 1}^{C} \exp (x_{n, i})} y_{n, c}$

where $x$ denotes the predicted values, $t$ denotes the target values, $w$ denotes the weights, and $N$ is the batch size. The index $c$ ranges from [0, C-1], representing the class indices, where $C$ is the number of classes.

If reduction is not set to 'none' (the default is 'mean'), the loss is computed as:

$\begin{array}{r} ℓ (x, y) = {\begin{cases} \frac{\sum_{n = 1}^{N} l_{n}}{N}, & if reduction ='mean', \\ \sum_{n = 1}^{N} l_{n}, & if reduction ='sum'. \end{cases} \end{array}$

Parameters

parallel_config (mindformers.modules.OpParallelConfig, optional) – The parallel configuration. Default: default_dpmp_config.
check_for_nan_in_loss_and_grad (bool, optional) – Whether to print local loss. Default: False.
monitor_device_local_loss (bool, optional) – Whether to monitor device local loss. Default: False.
calculate_per_token_loss (bool, optional) – Whether to use Megatron loss. Default: False.
seq_split_num (int, optional) – Sequence split number in sequence pipeline parallel mode. Default: 1.

Inputs:

logits (Tensor) - Tensor of shape (N, C). Data type must be float16 or float32. The output logits of the backbone.
label (Tensor) - Tensor of shape (N, ). The ground truth label of the sample.
input_mask (Tensor) - Tensor of shape (N, ). input_mask indicates whether there are padded inputs and for padded inputs it will not be counted into loss.

Returns: Tensor, the computed cross entropy loss value.

Examples

>>> import numpy as np
>>> from mindspore import dtype as mstype
>>> from mindspore import Tensor
>>> from mindformers.core import CrossEntropyLoss
>>> loss = CrossEntropyLoss()
>>> logits = Tensor(np.array([[3, 5, 6, 9, 12, 33, 42, 12, 32, 72]]), mstype.float32)
>>> labels_np = np.array([1]).astype(np.int32)
>>> input_mask = Tensor(np.ones(1).astype(np.float32))
>>> labels = Tensor(labels_np)
>>> output = loss(logits, labels, input_mask)
>>> output.shape
(1,)