mindspore.ops.CTCLoss
- class mindspore.ops.CTCLoss(preprocess_collapse_repeated=False, ctc_merge_repeated=True, ignore_longer_outputs_than_inputs=False)[source]
Calculates the CTC (Connectionist Temporal Classification) loss and the gradient.
The bottom layer of this interface calls the implementation of the third-party baidu-research::warp-ctc. The CTC algorithm is proposed in Connectionist Temporal Classification: Labeling Unsegmented Sequence Data with Recurrent Neural Networks.
CTCLoss calculates loss between a continuous time series and a target sequence. CTCLoss sums over the probability of input to target, producing a loss value which is differentiable with respect to each input node. The alignment of input to target is assumed to be “many-to-one”, such that the length of target series must be less than or equal to the length of input.
- Parameters
preprocess_collapse_repeated (bool) – If true, repeated labels will be collapsed prior to the CTC calculation. Default: False.
ctc_merge_repeated (bool) – If false, during CTC calculation, repeated non-blank labels will not be merged and these labels will be interpreted as individual ones. This is a simplified version of CTC. Default: True.
ignore_longer_outputs_than_inputs (bool) – If true, sequences with longer outputs than inputs will be ignored. Default: False.
- Inputs:
x (Tensor) - The input Tensor must be a 3-D tensor whose shape is \((max\_time, batch\_size, num\_classes)\). num_classes must be num_labels + 1 classes, num_labels indicates the number of actual labels. Blank labels are reserved. Default blank label is num_classes - 1. Data type must be float16, float32 or float64.
labels_indices (Tensor) - The indices of labels. labels_indices[i, :] = [b, t] means labels_values[i] stores the id for (batch b, time t). The type must be int64 and rank must be 2.
labels_values (Tensor) - A 1-D input tensor. The values are associated with the given batch and time. The type must be int32. labels_values[i] must be in the range of [0, num_classes).
sequence_length (Tensor) - A tensor containing sequence lengths with the shape of \((batch\_size, )\). The type must be int32. Each value in the tensor must not be greater than max_time.
- Outputs:
loss (Tensor) - A tensor containing log-probabilities, the shape is \((batch\_size, )\). The tensor has the same data type as x.
gradient (Tensor) - The gradient of loss, has the same shape and data type as x.
- Raises
TypeError – If preprocess_collapse_repeated, ctc_merge_repeated or ignore_longer_outputs_than_inputs is not a bool.
TypeError – If x, labels_indices, labels_values or sequence_length is not a Tensor.
ValueError – If rank of labels_indices is not equal to 2.
TypeError – If dtype of x is not one of the following: float16, float32 nor float64.
TypeError – If dtype of labels_indices is not int64.
TypeError – If dtype of labels_values or sequence_length is not int32.
- Supported Platforms:
Ascend
GPU
CPU
Examples
>>> x = Tensor(np.array([[[0.3, 0.6, 0.6], ... [0.4, 0.3, 0.9]], ... ... [[0.9, 0.4, 0.2], ... [0.9, 0.9, 0.1]]]).astype(np.float32)) >>> labels_indices = Tensor(np.array([[0, 0], [1, 0]]), mindspore.int64) >>> labels_values = Tensor(np.array([2, 2]), mindspore.int32) >>> sequence_length = Tensor(np.array([2, 2]), mindspore.int32) >>> ctc_loss = ops.CTCLoss() >>> loss, gradient = ctc_loss(x, labels_indices, labels_values, sequence_length) >>> print(loss) [ 0.79628 0.5995158 ] >>> print(gradient) [[[ 0.27029088 0.36485454 -0.6351454 ] [ 0.28140804 0.25462854 -0.5360366 ]] [[ 0.47548494 0.2883962 0.04510255 ] [ 0.4082751 0.4082751 0.02843709 ]]]