mindformers.core.CrossEntropyLoss

class mindformers.core.CrossEntropyLoss(parallel_config=default_dpmp_config, **kwargs)[source]

Calculate the cross entropy loss.

CrossEntropyLoss supports two different types of targets:

Class indices (int), where the range of values is \([0, C)\) with \(C\) being the number of classes. When reduction is set to 'none', the cross-entropy loss is computed as follows:

\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_{y_n} \log \frac{\exp(x_{n,y_n})}{\sum_{c=1}^C \exp(x_{n,c})} \cdot \mathbb{1}\{y_n \not= \text{ignore_index}\}\]

where \(x\) denotes the predicted values, \(t\) denotes the target values, \(w\) denotes the weights, and \(N\) is the batch size. The index \(c\) ranges from [0, C-1], representing the class indices, where \(C\) is the number of classes.

If reduction is not set to 'none' (the default is 'mean'), the loss is computed as:

\[\begin{split}\ell(x, y) = \begin{cases} \sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n} \cdot \mathbb{1}\{y_n \not= \text{ignore_index}\}} l_n, & \text{if reduction} = \text{'mean',}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{'sum'.} \end{cases}\end{split}\]
Class probabilities (float), used when the target is a probability distribution over multiple class labels. When reduction is set to 'none', the cross-entropy loss is computed as follows:

\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - \sum_{c=1}^C w_c \log \frac{\exp(x_{n,c})}{\sum_{i=1}^C \exp(x_{n,i})} y_{n,c}\]

where \(x\) denotes the predicted values, \(t\) denotes the target values, \(w\) denotes the weights, and \(N\) is the batch size. The index \(c\) ranges from [0, C-1], representing the class indices, where \(C\) is the number of classes.

If reduction is not set to 'none' (the default is 'mean'), the loss is computed as:

\[\begin{split}\ell(x, y) = \begin{cases} \frac{\sum_{n=1}^N l_n}{N}, & \text{if reduction} = \text{'mean',}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{'sum'.} \end{cases}\end{split}\]

Parameters: parallel_config (mindformers.modules.transformer.op_parallel_config.OpParallelConfig) – The parallel configuration. Default default_dpmp_config.

Inputs:

logits (Tensor) - Tensor of shape (N, C). Data type must be float16 or float32. The output logits of the backbone.
label (Tensor) - Tensor of shape (N, ). The ground truth label of the sample.
input_mask (Tensor) - Tensor of shape (N, ). input_mask indicates whether there are padded inputs and for padded inputs it will not be counted into loss.

Returns: Tensor, the computed cross entropy loss value.

Examples

>>> import numpy as np
>>> from mindspore import dtype as mstype
>>> from mindspore import Tensor
>>> from mindformers.core import CrossEntropyLoss
>>> loss = CrossEntropyLoss()
>>> logits = Tensor(np.array([[3, 5, 6, 9, 12, 33, 42, 12, 32, 72]]), mstype.float32)
>>> labels_np = np.array([1]).astype(np.int32)
>>> input_mask = Tensor(np.ones(1).astype(np.float32))
>>> labels = Tensor(labels_np)
>>> output = loss(logits, labels, input_mask)
>>> output.shape
(1,)