mindspore.nn.CTCLoss

class mindspore.nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)[source]

Calculates the CTC (Connectionist Temporal Classification) loss.

For the CTC algorithm, refer to Connectionist Temporal Classification: Labeling Unsegmented Sequence Data with Recurrent Neural Networks .

Parameters

blank (int) – The blank label. Default: 0.
reduction (str) – Apply specific reduction method to the output: ‘none’, ‘mean’, or ‘sum’. Default: ‘mean’.
zero_infinity (bool) – Whether to set infinite loss and correlation gradient to zero. Default: False.

Inputs:

log_probs (Tensor) - A tensor of shape (T, N, C) or (T, C), where T is input length, N is batch size and C is number of classes (including blank). T, N and C are positive integers.
targets (Tensor) - A tensor of shape (N, S) or (sum( target_lengths )), where S is max target length, means the target sequences.
input_lengths (Union[tuple, Tensor, int]) - A tuple or Tensor of shape(N), or a number. It means the lengths of the input.
target_lengths (Union[tuple, Tensor, int]) - A tuple or Tensor of shape(N), or a number. It means the lengths of the target.

Outputs:

neg_log_likelihood (Tensor) - A loss value which is differentiable with respect to each input node.

Raises

TypeError – If zero_infinity is not a bool, reduction is not string.
TypeError – If the dtype of log_probs is not float or double.
TypeError – If the dtype of targets, input_lengths or target_lengths is not int32 or int64.
ValueError – If reduction is not “none”, “mean” or “sum”.
ValueError – If the types of targets, input_lengths or target_lengths are different.
ValueError – If the value of blank is not in range [0, C). C is number of classes of log_probs .
ValueError – If any value of input_lengths is larger than C. C is number of classes of log_probs .
ValueError – If any target_lengths[i] is not in range [0, input_length[i]].

Supported Platforms:: Ascend CPU

Examples

>>> import numpy as np
>>> from mindspore import Tensor
>>> from mindspore import dtype as mstype
>>> from mindspore.nn.loss import CTCLoss
>>> T = 5      # Input sequence length
>>> C = 2      # Number of classes
>>> N = 2      # Batch size
>>> S = 3      # Target sequence length of longest target in batch (padding length)
>>> S_min = 2  # Minimum target length, for demonstration purposes
>>> arr = np.arange(T*N*C).reshape((T, N, C))
>>> ms_input = Tensor(arr, dtype=mstype.float32)
>>> input_lengths = np.full(shape=(N), fill_value=T)
>>> input_lengths = Tensor(input_lengths, dtype=mstype.int32)
>>> target_lengths = np.full(shape=(N), fill_value=S_min)
>>> target_lengths = Tensor(target_lengths, dtype=mstype.int32)
>>> target = np.random.randint(1, C, size=(N, S))
>>> target = Tensor(target, dtype=mstype.int32)
>>> ctc_loss = CTCLoss(blank=0, reduction='none', zero_infinity=False)
>>> loss = ctc_loss(ms_input, target, input_lengths, target_lengths)
>>> print(loss)
Tensor(shape=[2], dtype=Float32, value= [-4.57949715e+001, -5.57949677e+001])
>>> arr = np.arange(T*C).reshape((T, C))
>>> ms_input = Tensor(arr, dtype=mstype.float32)
>>> input_lengths = T
>>> target_lengths = S_min
>>> target = np.random.randint(1, C, size=(S_min,))
>>> target = Tensor(target, dtype=mstype.int32)
>>> ctc_loss = CTCLoss(blank=0, reduction='none', zero_infinity=False)
>>> loss = ctc_loss(ms_input, target, input_lengths, target_lengths)
>>> print(loss)
Tensor(shape=[1], dtype=Float32, value= [-2.57949677e+001])