mindearth.data.Dataset

View Source On Gitee
class mindearth.data.Dataset(dataset_generator, distribute=False, num_workers=1, shuffle=True)[source]

Create the dataset for training, validation and testing, and output an instance of class mindspore.dataset.GeneratorDataset.

Parameters
  • dataset_generator (Data) – the data generator of weather dataset.

  • distribute (bool, optional) – whether or not to perform parallel training. Default: False.

  • num_workers (int, optional) – number of workers(threads) to process the dataset in parallel. Default: 1.

  • shuffle (bool, optional) – whether or not to perform shuffle on the dataset. Random accessible input is required. Default: True, expected order behavior shown in the table.

Supported Platforms:

Ascend GPU

Examples

>>> from mindearth.data import Era5Data, Dataset
>>> data_params = {
...     'name': 'era5',
...     'root_dir': './dataset',
...     'feature_dims': 69,
...     't_in': 1,
...     't_out_train': 1,
...     't_out_valid': 20,
...     't_out_test': 20,
...     'valid_interval': 1,
...     'test_interval': 1,
...     'train_interval': 1,
...     'pred_lead_time': 6,
...     'data_frequency': 6,
...     'train_period': [2015, 2015],
...     'valid_period': [2016, 2016],
...     'test_period': [2017, 2017],
...     'patch': True,
...     'patch_size': 8,
...     'batch_size': 8,
...     'num_workers': 1,
...     'grid_resolution': 1.4,
...     'h_size': 128,
...     'w_size': 256
... }
>>> dataset_generator = Era5Data(data_params)
>>> dataset = Dataset(dataset_generator)
>>> train_dataset = dataset.create_dataset(1)
create_dataset(batch_size)[source]

create dataset.

Parameters

batch_size (int, optional) – An int number of rows each batch is created with.

Returns

BatchDataset, dataset batched.