mindspore.ops.CTCLossV2

class mindspore.ops.CTCLossV2(blank=0, reduction='none', zero_infinity=False)[source]

Calculates the CTC (Connectionist Temporal Classification) loss and the gradient.

The CTC algorithm is proposed in Connectionist Temporal Classification: Labeling Unsegmented Sequence Data with Recurrent Neural Networks.

Warning

This is an experimental API that is subject to change or deletion.

Parameters

blank (int, optional) – The blank label. Default: 0 .
reduction (str, optional) – Apply specific reduction method to the output. Currently only support 'none'. Default: 'none' .
zero_infinity (bool, optional) – If loss is infinite, this parameter determines whether to set that loss and its correlated gradient to zero. Default: False .

Inputs:

log_probs (Tensor) - A 3D tensor of shape \((T, N, C)\), where \(T\) is input length, \(N\) is batch size and \(C\) is number of classes (including blank). Supported dtypes: float32, float64.
targets (Tensor) - A 2D tensor of shape \((N, S)\), where \(S\) is max target length, means the target sequences. Supported dtypes: int32, int64.
input_lengths (Union(Tuple, Tensor)) - A tuple or Tensor of shape \((N)\). It means the lengths of the input. Supported dtypes: int32, int64.
target_lengths (Union(Tuple, Tensor)) - A tuple or Tensor of shape \((N)\). It means the lengths of the target. Supported dtypes: int32, int64.

Outputs:

neg_log_likelihood (Tensor) - A loss value which is differentiable with respect to each input node.
log_alpha (Tensor) - The probability of possible trace of input to target.

Raises

TypeError – If zero_infinity is not a bool.
TypeError – If reduction is not string.
TypeError – If the dtype of log_probs is not float or double.
TypeError – If the dtype of targets, input_lengths or target_lengths is not int32 or int64.
ValueError – If the rank of log_probs is not 3.
ValueError – If the rank of targets is not 2.
ValueError – If the shape of input_lengths does not match batch_size \(N\).
ValueError – If the shape of target_lengths does not match batch_size \(N\).
TypeError – If the types of targets, input_lengths or target_lengths are different.
ValueError – If the value of blank is not in range [0, C).
RuntimeError – If any value of input_lengths is larger than (num_labels|C).
RuntimeError – If any target_lengths[i] is not in range [0, input_length[i]].

Supported Platforms:: Ascend GPU CPU

Examples

>>> import numpy as np
>>> from mindspore import Tensor, ops
>>> from mindspore import dtype as mstype
>>> log_probs = Tensor(np.array([[[0.3, 0.6, 0.6]],
...                              [[0.9, 0.4, 0.2]]]).astype(np.float32))
>>> targets = Tensor(np.array([[0, 1]]), mstype.int32)
>>> input_lengths = Tensor(np.array([2]), mstype.int32)
>>> target_lengths = Tensor(np.array([1]), mstype.int32)
>>> CTCLossV2 = ops.CTCLossV2(blank=0, reduction='none', zero_infinity=False)
>>> neg_log_hood, log_alpha = CTCLossV2(
...     log_probs, targets, input_lengths, target_lengths)
>>> print(neg_log_hood)
[-2.2986124]
>>> print(log_alpha)
[[[0.3       0.3            -inf      -inf 1.8931472 1.2       0.   0.       ]
  [0.        0.       0.        0.       0.        0.          0.   0.       ]]]