mindspore.ops.CTCLoss

class mindspore.ops.CTCLoss(preprocess_collapse_repeated=False, ctc_merge_repeated=True, ignore_longer_outputs_than_inputs=False)[source]

Calculates the CTC (Connectionist Temporal Classification) loss and the gradient.

The bottom layer of this interface calls the implementation of the third-party baidu-research::warp-ctc. The CTC algorithm is proposed in Connectionist Temporal Classification: Labeling Unsegmented Sequence Data with Recurrent Neural Networks.

CTCLoss calculates loss between a continuous time series and a target sequence. CTCLoss sums over the probability of input to target, producing a loss value which is differentiable with respect to each input node. The alignment of input to target is assumed to be “many-to-one”, such that the length of target series must be less than or equal to the length of input.

Parameters
  • preprocess_collapse_repeated (bool) – If true, repeated labels will be collapsed prior to the CTC calculation. Default: False.

  • ctc_merge_repeated (bool) – If false, during CTC calculation, repeated non-blank labels will not be merged and these labels will be interpreted as individual ones. This is a simplified version of CTC. Default: True.

  • ignore_longer_outputs_than_inputs (bool) – If true, sequences with longer outputs than inputs will be ignored. Default: False.

Inputs:
  • x (Tensor) - The input Tensor must be a 3-D tensor whose shape is \((max\_time, batch\_size, num\_classes)\). num_classes must be num_labels + 1 classes, num_labels indicates the number of actual labels. Blank labels are reserved. Default blank label is num_classes - 1. Data type must be float16, float32 or float64.

  • labels_indices (Tensor) - The indices of labels. labels_indices[i, :] = [b, t] means labels_values[i] stores the id for (batch b, time t). The type must be int64 and rank must be 2.

  • labels_values (Tensor) - A 1-D input tensor. The values are associated with the given batch and time. The type must be int32. labels_values[i] must be in the range of [0, num_classes).

  • sequence_length (Tensor) - A tensor containing sequence lengths with the shape of \((batch\_size, )\). The type must be int32. Each value in the tensor must not be greater than max_time.

Outputs:
  • loss (Tensor) - A tensor containing log-probabilities, the shape is \((batch\_size, )\). The tensor has the same data type as x.

  • gradient (Tensor) - The gradient of loss, has the same shape and data type as x.

Raises
  • TypeError – If preprocess_collapse_repeated, ctc_merge_repeated or ignore_longer_outputs_than_inputs is not a bool.

  • TypeError – If x, labels_indices, labels_values or sequence_length is not a Tensor.

  • ValueError – If rank of labels_indices is not equal to 2.

  • TypeError – If dtype of x is not one of the following: float16, float32 nor float64.

  • TypeError – If dtype of labels_indices is not int64.

  • TypeError – If dtype of labels_values or sequence_length is not int32.

Supported Platforms:

Ascend GPU CPU

Examples

>>> x = Tensor(np.array([[[0.3, 0.6, 0.6],
...                       [0.4, 0.3, 0.9]],
...
...                      [[0.9, 0.4, 0.2],
...                       [0.9, 0.9, 0.1]]]).astype(np.float32))
>>> labels_indices = Tensor(np.array([[0, 0], [1, 0]]), mindspore.int64)
>>> labels_values = Tensor(np.array([2, 2]), mindspore.int32)
>>> sequence_length = Tensor(np.array([2, 2]), mindspore.int32)
>>> ctc_loss = ops.CTCLoss()
>>> loss, gradient = ctc_loss(x, labels_indices, labels_values, sequence_length)
>>> print(loss)
[ 0.79628  0.5995158 ]
>>> print(gradient)
[[[ 0.27029088  0.36485454  -0.6351454  ]
  [ 0.28140804  0.25462854  -0.5360366 ]]
 [[ 0.47548494  0.2883962    0.04510255 ]
  [ 0.4082751   0.4082751    0.02843709 ]]]