mindspore.ops.CTCLoss

class mindspore.ops.CTCLoss(*args, **kwargs)[source]

Calculates the CTC (Connectionist Temporal Classification) loss and the gradient.

The CTC algorithm is proposed in Connectionist Temporal Classification: Labeling Unsegmented Sequence Data with Recurrent Neural Networks.

Parameters
  • preprocess_collapse_repeated (bool) – If true, repeated labels will be collapsed prior to the CTC calculation. Default: False.

  • ctc_merge_repeated (bool) – If false, during CTC calculation, repeated non-blank labels will not be merged and these labels will be interpreted as individual ones. This is a simplfied version of CTC. Default: True.

  • ignore_longer_outputs_than_inputs (bool) – If true, sequences with longer outputs than inputs will be ignored. Default: False.

Inputs:
  • inputs (Tensor) - The input Tensor must be a 3-D tensor whose shape is (max_time, batch_size, num_classes). num_classes must be num_labels + 1 classes, num_labels indicates the number of actual labels. Blank labels are reserved. Default blank label is num_classes - 1. Data type must be float16, float32 or float64.

  • labels_indices (Tensor) - The indices of labels. labels_indices[i, :] == [b, t] means labels_values[i] stores the id for (batch b, time t). The type must be int64 and rank must be 2.

  • labels_values (Tensor) - A 1-D input tensor. The values are associated with the given batch and time. The type must be int32. labels_values[i] must in the range of [0, num_classes).

  • sequence_length (Tensor) - A tensor containing sequence lengths with the shape of (batch_size). The type must be int32. Each value in the tensor must not be greater than max_time.

Outputs:
  • loss (Tensor) - A tensor containing log-probabilities, the shape is (batch_size). The tensor has the same type with inputs.

  • gradient (Tensor) - The gradient of loss, has the same type and shape with inputs.

Raises
  • TypeError – If preprocess_collapse_repeated, ctc_merge_repeated or ignore_longer_outputs_than_inputs is not a bool.

  • TypeError – If inputs, labels_indices, labels_values or sequence_length is not a Tensor.

  • TypeError – If dtype of inputs is not one of the following: float16, float32 or float64.

  • TypeError – If dtype of labels_indices is not int64.

  • TypeError – If dtype of labels_values or sequence_length is not int32.

Supported Platforms:

Ascend GPU CPU

Examples

>>> import mindspore
>>> import numpy as np
>>> from mindspore import Tensor
>>> from mindspore.ops import operations as ops
>>> inputs = Tensor(np.array([[[0.3, 0.6, 0.6],
...                            [0.4, 0.3, 0.9],
...
...                           [[0.9, 0.4, 0.2],
...                            [0.9, 0.9, 0.1]]]).astype(np.float32))
>>> labels_indices = Tensor(np.array([[0, 0], [1, 0]]), mindspore.int64)
>>> labels_values = Tensor(np.array([2, 2]), mindspore.int32)
>>> sequence_length = Tensor(np.array([2, 2]), mindspore.int32)
>>> ctc_loss = ops.CTCLoss()
>>> loss, gradient = ctc_loss(inputs, labels_indices, labels_values, sequence_length)
>>> print(loss)
[ 0.75466496  0.6259288 ]
>>> print(gradient)
[[[ 0.26944962  0.34967452  -0.6191241  ]
  [ 0.28437287  0.25814265  -0.5425156 ]]
 [[ 0.46983105  0.29572973  0.04452367 ]
  [ 0.41254193  0.4185341  0.02441157 ]]]