mindspore.dataset.Dataset.concat
- mindspore.dataset.Dataset.concat(datasets)[源代码]
对传入的多个数据集对象进行拼接操作。可以使用”+”运算符来进行数据集进行拼接。
说明
用于拼接的多个数据集对象,每个数据集对象的列名、每列数据的维度(rank)和数据类型必须相同。
- 参数:
datasets (Union[list, Dataset]) - 与当前数据集对象拼接的数据集对象列表或单个数据集对象。
- 返回:
Dataset,应用了上述操作的新数据集对象。
样例:
>>> import mindspore.dataset as ds >>> dataset_1 = ds.GeneratorDataset([1, 2, 3], "column1", shuffle=False) >>> dataset_2 = ds.GeneratorDataset([4, 5, 6], "column1", shuffle=False) >>> >>> # Create a dataset by concatenating dataset_1 and dataset_2 with "+" operator >>> dataset = dataset_1 + dataset_2 >>> # Create a dataset by concatenating dataset_1 and dataset_2 with concat operation >>> dataset = dataset_1.concat(dataset_2) >>> >>> # Check the data order of dataset >>> dataset_1 = ds.GeneratorDataset([1, 2, 3], "column1", shuffle=False) >>> dataset_2 = ds.GeneratorDataset([4, 5, 6], "column1", shuffle=False) >>> dataset = dataset_1 + dataset_2 >>> result = list(dataset) >>> # [[Tensor(shape=[], dtype=Int64, value= 1)], [Tensor(shape=[], dtype=Int64, value= 2)], >>> # [Tensor(shape=[], dtype=Int64, value= 3)], [Tensor(shape=[], dtype=Int64, value= 4)], >>> # [Tensor(shape=[], dtype=Int64, value= 5)], [Tensor(shape=[], dtype=Int64, value= 6)]] >>> >>> # Change the data order of concatenated dataset with sharding selection >>> dataset_1 = ds.GeneratorDataset([1, 2, 3], "column1", shuffle=False) >>> dataset_2 = ds.GeneratorDataset([4, 5, 6], "column1", shuffle=False) >>> dataset = dataset_1.concat(dataset_2) >>> dataset.use_sampler(ds.DistributedSampler(num_shards=2, shard_id=1, shuffle=False)) >>> result = list(dataset) >>> # [[Tensor(shape=[], dtype=Int64, value= 2)], [Tensor(shape=[], dtype=Int64, value= 4)], >>> # [Tensor(shape=[], dtype=Int64, value= 6)]] >>> >>> # Change the data order of concatenated dataset with random selection >>> dataset_1 = ds.GeneratorDataset([1, 2, 3], "column1", shuffle=False) >>> dataset_2 = ds.GeneratorDataset([4, 5, 6], "column1", shuffle=False) >>> dataset = dataset_1.concat(dataset_2) >>> dataset.use_sampler(ds.RandomSampler()) >>> result = list(dataset) >>> # [[Tensor(shape=[], dtype=Int64, value= 1)], [Tensor(shape=[], dtype=Int64, value= 4)], >>> # [Tensor(shape=[], dtype=Int64, value= 2)], [Tensor(shape=[], dtype=Int64, value= 5)], >>> # [Tensor(shape=[], dtype=Int64, value= 6)], [Tensor(shape=[], dtype=Int64, value= 3)]]