Class BucketBatchByLengthDataset
Defined in File datasets.h
Inheritance Relationships
Base Type
public mindspore::dataset::Dataset
(Class Dataset)
Class Documentation
-
class BucketBatchByLengthDataset : public mindspore::dataset::Dataset
The result of applying BucketBatchByLength operator to the input dataset.
Public Functions
Constructor of BucketBatchByLengthDataset.
Note
Bucket elements according to their lengths. Each bucket will be padded and batched when they are full.
- Parameters
input – [in] The dataset which need to apply bucket batch by length operation.
column_names – [in] Columns passed to element_length_function.
bucket_boundaries – [in] A list consisting of the upper boundaries of the buckets. Must be strictly increasing. If there are n boundaries, n+1 buckets are created: One bucket for [0, bucket_boundaries[0]), one bucket for [bucket_boundaries[i], bucket_boundaries[i+1]) for each 0<i<n, and one bucket for [bucket_boundaries[n-1], inf).
bucket_batch_sizes – [in] A list consisting of the batch sizes for each bucket. Must contain elements equal to the size of bucket_boundaries + 1.
element_length_function – [in] A function pointer that takes in MSTensorVec and outputs a MSTensorVec. The output must contain a single tensor containing a single int32_t. If no value is provided, then size of column_names must be 1, and the size of the first dimension of that column will be taken as the length (default=nullptr).
pad_info – [in] Represents how to batch each column. The key corresponds to the column name, the value must be a tuple of 2 elements. The first element corresponds to the shape to pad to, and the second element corresponds to the value to pad with. If a column is not specified, then that column will be padded to the longest in the current batch, and 0 will be used as the padding value. Any unspecified dimensions will be padded to the longest in the current batch, unless if pad_to_bucket_boundary is true. If no padding is wanted, set pad_info to None (default=empty dictionary).
pad_to_bucket_boundary – [in] If true, will pad each unspecified dimension in pad_info to the bucket_boundary minus 1. If there are any elements that fall into the last bucket, an error will occur (default=false).
drop_remainder – [in] If true, will drop the last batch for each bucket if it is not a full batch (default=false).
-
~BucketBatchByLengthDataset() = default
Destructor of BucketBatchByLengthDataset.