mindformers.core.PerplexityMetric

class mindformers.core.PerplexityMetric[source]

Perplexity is defined as the exponentiated average negative log-probability assigned by the model to each word in the test set. Mathematically, for a sequence of words $W = (w_{1}, w_{2}, \dots, w_{N})$ , the perplexity (PP) is given by:

P P (W) = P (w_{1}, w_{2}, \dots, w_{N})^{- \frac{1}{N}} = \sqrt[N]{\frac{1}{P (w_{1}, w_{2}, \dots, w_{N})}}

Where $P (w_{1}, w_{2}, \dots, w_{N})$ is the probability of the sequence under the model.

In practical terms, perplexity can be rewritten as:

P P (W) = \exp (- \frac{1}{N} \sum_{i = 1}^{N} \log P (w_{i} | w_{1}, w_{2}, \dots, w_{i - 1}))

This equation highlights that a lower perplexity indicates a better-performing language model, as it suggests that the model assigns higher probabilities to the actual sequence of words.

Examples

>>> import numpy as np
>>> from mindspore import Tensor
>>> from mindformers.core.metric.metric import PerplexityMetric
>>> x = Tensor(np.array([[[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]]))
>>> y = Tensor(np.array([[1, 0, 1]]))
>>> mask = Tensor(np.array([[1, 1, 1]]))
>>> metric = PerplexityMetric()
>>> metric.clear()
>>> metric.update(x, y, mask)
>>> perplexity = metric.eval()
>>> print(perplexity)
'loss': 0.8262470960617065, 'PPL': 2.284728265028813}

clear()[source]: Clearing the internal evaluation result.

eval()[source]

Computing the evaluation result.

Returns: A dict of evaluation results with loss and PPL scores.

update(*inputs)[source]

Updating the internal evaluation result.

Parameters: *inputs (List) – Logits, labels, and input_mask. Logits is a tensor of shape $[N, S, W]$ with data type Float16 or Float32, Labels and input_mask is a tensor of shape $[N, S]$ with data type Int32 or Int64. where $N$ is the batch size, $S$ is the sequence length, and $W$ is the vocabulary size.