mindspore.dataset.Dataset.concat

mindspore.dataset.Dataset.concat(datasets)[源代码]

对传入的多个数据集对象进行拼接操作。可以使用”+”运算符来进行数据集进行拼接。

说明

用于拼接的多个数据集对象,每个数据集对象的列名、每列数据的维度(rank)和数据类型必须相同。

参数:
  • datasets (Union[list, Dataset]) - 与当前数据集对象拼接的数据集对象列表或单个数据集对象。

返回:

Dataset,应用了上述操作的新数据集对象。

样例:

>>> import mindspore.dataset as ds
>>> dataset_1 = ds.GeneratorDataset([1, 2, 3], "column1", shuffle=False)
>>> dataset_2 = ds.GeneratorDataset([4, 5, 6], "column1", shuffle=False)
>>>
>>> # Create a dataset by concatenating dataset_1 and dataset_2 with "+" operator
>>> dataset = dataset_1 + dataset_2
>>> # Create a dataset by concatenating dataset_1 and dataset_2 with concat operation
>>> dataset = dataset_1.concat(dataset_2)
>>>
>>> # Check the data order of dataset
>>> dataset_1 = ds.GeneratorDataset([1, 2, 3], "column1", shuffle=False)
>>> dataset_2 = ds.GeneratorDataset([4, 5, 6], "column1", shuffle=False)
>>> dataset = dataset_1 + dataset_2
>>> result = list(dataset)
>>> # [[Tensor(shape=[], dtype=Int64, value= 1)], [Tensor(shape=[], dtype=Int64, value= 2)],
>>> #  [Tensor(shape=[], dtype=Int64, value= 3)], [Tensor(shape=[], dtype=Int64, value= 4)],
>>> #  [Tensor(shape=[], dtype=Int64, value= 5)], [Tensor(shape=[], dtype=Int64, value= 6)]]
>>>
>>> # Change the data order of concatenated dataset with sharding selection
>>> dataset_1 = ds.GeneratorDataset([1, 2, 3], "column1", shuffle=False)
>>> dataset_2 = ds.GeneratorDataset([4, 5, 6], "column1", shuffle=False)
>>> dataset = dataset_1.concat(dataset_2)
>>> dataset.use_sampler(ds.DistributedSampler(num_shards=2, shard_id=1, shuffle=False))
>>> result = list(dataset)
>>> # [[Tensor(shape=[], dtype=Int64, value= 2)], [Tensor(shape=[], dtype=Int64, value= 4)],
>>> #  [Tensor(shape=[], dtype=Int64, value= 6)]]
>>>
>>> # Change the data order of concatenated dataset with random selection
>>> dataset_1 = ds.GeneratorDataset([1, 2, 3], "column1", shuffle=False)
>>> dataset_2 = ds.GeneratorDataset([4, 5, 6], "column1", shuffle=False)
>>> dataset = dataset_1.concat(dataset_2)
>>> dataset.use_sampler(ds.RandomSampler())
>>> result = list(dataset)
>>> # [[Tensor(shape=[], dtype=Int64, value= 1)], [Tensor(shape=[], dtype=Int64, value= 4)],
>>> #  [Tensor(shape=[], dtype=Int64, value= 2)], [Tensor(shape=[], dtype=Int64, value= 5)],
>>> #  [Tensor(shape=[], dtype=Int64, value= 6)], [Tensor(shape=[], dtype=Int64, value= 3)]]