mindflow.data.ExistedDataset

View Source On Gitee
class mindflow.data.ExistedDataset(name=None, data_dir=None, columns_list=None, data_format='npy', constraint_type='Label', random_merge=True, data_config=None)[source]

Creates a dataset with given data path.

Note

The 'npy' data format is supported now.

Parameters
  • name (str, optional) – specifies the name of dataset. Default: None. If data_config is None, the name should not be None.

  • data_dir (Union[str, list, tuple], optional) – the path of existed data files. Default: None. If data_config is None, the data_dir should not be None.

  • columns_list (Union[str, list, tuple], optional) – list of column names of the dataset. Default: None. If data_config is None, the columns_list should not be None.

  • data_format (str, optional) – the format of existed data files. Default: 'npy'.

  • constraint_type (str, optional) – specifies the constraint type of the created dataset. Default: "Label".

  • random_merge (bool, optional) – specifies whether randomly merge the given datasets. Default: True.

  • data_config (ExistedDataConfig, optional) – Instance of ExistedDataConfig which collect the info described above. Default: None. If it's not None, the dataset class will be create by using it for simplifying. If it's None, the info of (name, data_dir, columns_list, data_format, constraint_type, random_merge) will be used for replacement.

Raises
  • ValueError – Argument name/data_dir/columns_list is None when data_config is None.

  • TypeError – If data_config is not a instance of ExistedDataConfig.

  • ValueError – If data_format is not 'npy'.

Supported Platforms:

Ascend GPU

Examples

>>> from mindflow.data import ExistedDataConfig, ExistedDataset
>>> data_config = ExistedDataConfig(name='exist',
...                                 data_dir=['./data.npy'],
...                                 columns_list=['input_data'], data_format="npy", constraint_type="Equation")
>>> dataset = ExistedDataset(data_config=data_config)