mindflow.data

init

class mindflow.data.BoundaryBC(geometry)[source]

Sampling data of boundary condition.

Parameters

geometry (Geometry) – specifies geometry information of boundary condition.

Raises

ValueError – if sampling_config.bc of geometry is None.

Supported Platforms:

Ascend GPU

Examples

>>> from mindflow.geometry import generate_sampling_config, Geometry
>>> from mindflow.data import BoundaryBC
>>> geometry_config = dict({'BC' : dict({'random_sampling' : True, 'size' : 100, 'sampler' : 'uniform',})})
>>> sampling_config = generate_sampling_config(geometry_config)
>>> geom = Geometry("geom", 1, 0.0, 1.0, sampling_config=sampling_config)
>>> boundary_bc = BoundaryBC(geometry=geom)
class mindflow.data.BoundaryIC(geometry)[source]

Sampling data of initial condition.

Parameters

geometry (Geometry) – specifies geometry information of initial condition.

Raises

ValueError – if sampling_config.ic of geometry is None.

Supported Platforms:

Ascend GPU

Examples

>>> from mindflow.geometry import generate_sampling_config, Geometry
>>> from mindflow.data import BoundaryIC
>>> geometry_config = dict({'IC' : dict({'random_sampling' : True, 'size' : 100, 'sampler' : 'uniform',})})
>>> sampling_config = generate_sampling_config(geometry_config)
>>> geom = Geometry("geom", 1, 0.0, 1.0, sampling_config=sampling_config)
>>> boundary_ic = BoundaryIC(geometry=geom)
class mindflow.data.Dataset(geometry_dict=None, existed_data_list=None, dataset_list=None)[source]

Combine datasets together.

Parameters
  • geometry_dict (dict, optional) – specifies geometry datasets to be merged. The key is geometry instance and value is a list of type of geometry. For example, geometry_dict = {geom : [“domain”, “BC”, “IC”]}. Default: None.

  • existed_data_list (Union[list, tuple, ExistedDataConfig], optional) – specifies existed datasets to be merged. For example, existed_data_list = [ExistedDataConfig_Instance1, ExistedDataConfig_Instance2]. Default: None.

  • dataset_list (Union[list, tuple, Data], optional) – specifies instances of data to be merged. For example, dataset_list=[BoundaryIC_Instance, Equation_Instance, BoundaryBC_Instance and ExistedData_Instance]. Default: None.

Raises
  • ValueError – If geometry_dict, existed_data_list and dataset_list are all None.

  • TypeError – If the type of geometry_dict is not dict.

  • TypeError – If the type of key of geometry_dict is not instance of class Geometry.

  • TypeError – If the type of existed_data_list is not list, tuple or instance of ExistedDataConfig.

  • TypeError – If the element of existed_data_list is not instance of ExistedDataConfig.

  • TypeError – If the element of dataset_list is not instance of class Data.

Supported Platforms:

Ascend GPU

Examples

>>> from mindflow.geometry import Rectangle, generate_sampling_config
>>> from mindflow.data import Dataset
>>> rectangle_mesh = dict({'domain': dict({'random_sampling': False, 'size': [50, 25]})})
>>> rect_space = Rectangle("rectangle", coord_min=[0, 0], coord_max=[5, 5],
...                        sampling_config=generate_sampling_config(rectangle_mesh))
>>> geom_dict = {rect_space: ["domain"]}
>>> dataset = Dataset(geometry_dict=geom_dict)
create_dataset(batch_size=1, preprocess_fn=None, input_output_columns_map=None, shuffle=True, drop_remainder=True, prebatched_data=False, num_parallel_workers=1, num_shards=None, shard_id=None, python_multiprocessing=False)[source]

create the final mindspore type dataset to merge all the sub-datasets.

Parameters
  • batch_size (int, optional) – An int number of rows each batch is created with. Default: 1.

  • preprocess_fn (Union[list[TensorOp], list[functions]], optional) – List of operations to be applied on the dataset. Operations are applied in the order they appear in this list. Default: None.

  • input_output_columns_map (dict, optional) – specifies which columns to replace and to what. The key is the column name to be replaced and the value is the name you want to replace with. There’s no need to set this argument if all columns are not changed after mapping. Default: None.

  • shuffle (bool, optional) – Whether or not to perform shuffle on the dataset. Random accessible input is required. Default: True, expected order behavior shown in the table.

  • drop_remainder (bool, optional) – Determines whether or not to drop the last block whose data row number is less than batch size. If True, and if there are less than batch_size rows available to make the last batch, then those rows will be dropped and not propagated to the child node. Default: True.

  • prebatched_data (bool, optional) – Generate pre-batched data before create mindspore dataset. If True, pre-batched data will be returned when get each sub-dataset data by index. Else, the batch operation will be done by mindspore dataset interface: dataset.batch. When batch_size is very large, it’s recommended to set this option to be True in order to improve performance on host. Default: False.

  • num_parallel_workers (int, optional) – Number of workers(threads) to process the dataset in parallel. Default: 1.

  • num_shards (int, optional) – Number of shards that the dataset will be divided into. Random accessible input is required. When this argument is specified, num_samples reflects the maximum sample number of per shard. Default: None.

  • shard_id (int, optional) – The shard ID within num_shards. This argument must be specified only when num_shards is also specified. Random accessible input is required. Default: None.

  • python_multiprocessing (bool, optional) – Parallelize Python function per_batch_map with multi-processing. This option could be beneficial if the function is computational heavy. Default: False.

Returns

BatchDataset, dataset batched.

Examples

>>> data = dataset.create_dataset()
get_columns_list()[source]

get columns list

Args:

Returns

list[str]. column names list of the final unified dataset.

Examples

>>> columns_list = dataset.get_columns_list()
set_constraint_type(constraint_type='Equation')[source]

set constraint type of dataset

Parameters

constraint_type (Union[str, dict]) – The constraint type of specified dataset. If is string, the constraint type of all subdataset will be set to the same one. If is dict, the subdataset and it’s constraint type is specified by the pair (key, value). Default: “Equation”.

Examples

>>> dataset.set_constraint_type("Equation")
class mindflow.data.Equation(geometry)[source]

Sampling data of equation domain.

Parameters

geometry (Geometry) – specifies geometry information of equation domain.

Raises
  • TypeError – if geometry is not instance of class Geometry.

  • ValueError – if sampling_config of geometry is None.

  • KeyError – if sampling_config.domain of geometry is None.

Supported Platforms:

Ascend GPU

Examples

>>> from mindflow.geometry import generate_sampling_config, Geometry
>>> from mindflow.data import Equation
>>> geometry_config = dict({'domain' : dict({'random_sampling' : True, 'size' : 100, 'sampler' : 'uniform',})})
>>> sampling_config = generate_sampling_config(geometry_config)
>>> geom = Geometry("geom", 1, 0.0, 1.0, sampling_config=sampling_config)
>>> boundary = Equation(geometry=geom)
class mindflow.data.ExistedDataConfig(name, data_dir, columns_list, data_format='npy', constraint_type='Label', random_merge=True)[source]

Set arguments of ExistedDataset.

Parameters
  • name (str) – specifies the name of dataset.

  • data_dir (Union[str, list, tuple]) – the path of existed data files.

  • columns_list (Union[str, list, tuple]) – list of column names of the dataset.

  • data_format (str, optional) – the format of existed data files (default=’npy’). The format of ‘npy’ is supported now.

  • constraint_type (str, optional) – specifies the constraint type of the created dataset (default=”Label”).

  • random_merge (bool, optional) – specifies whether randomly merge the given datasets (default=True).

Supported Platforms:

Ascend GPU

class mindflow.data.ExistedDataset(name=None, data_dir=None, columns_list=None, data_format='npy', constraint_type='Label', random_merge=True, data_config=None)[source]

Creates a dataset with given data path.

Note

The npy data format is supported now.

Parameters
  • name (str, optional) – specifies the name of dataset (default=None). If data_config is None, the name should not be None.

  • data_dir (Union[str, list, tuple], optional) – the path of existed data files (default=None). If data_config is None, the data_dir should not be None.

  • columns_list (Union[str, list, tuple], optional) – list of column names of the dataset (default=None). If data_config is None, the columns_list should not be None.

  • data_format (str, optional) – the format of existed data files (default=’npy’).

  • constraint_type (str, optional) – specifies the constraint type of the created dataset (default=”Label”).

  • random_merge (bool, optional) – specifies whether randomly merge the given datasets (default=True).

  • data_config (ExistedDataConfig, optional) – Instance of ExistedDataConfig which collect the info described above (default=None). If it’s not None, the dataset class will be create by using it for simplifying. If it’s None, the info of (name, data_dir, columns_list, data_format, constraint_type, random_merge) will be used for replacement.

Raises
  • ValueError – Argument name/data_dir/columns_list is None when data_config is None.

  • TypeError – If data_config is not a instance of ExistedDataConfig.

  • ValueError – If data_format is not ‘npy’.

Supported Platforms:

Ascend GPU

Examples

>>> from mindflow.data import ExistedDataConfig, ExistedDataset
>>> data_config = ExistedDataConfig(name='exist',
...                                 data_dir=['./data.npy'],
...                                 columns_list=['input_data'], data_format="npy", constraint_type="Equation")
>>> dataset = ExistedDataset(data_config=data_config)
class mindflow.data.MindDataset(dataset_files, dataset_name='dataset', constraint_type='Label', shuffle=True, num_shards=None, shard_id=None, sampler=None, num_samples=None, num_parallel_workers=None)[source]

Create dataset from MindRecord-type data.

Parameters
  • dataset_files (Union[str, list[str]]) – If dataset_file is a str, it represents for a file name of one component of a mindrecord source, other files with identical source in the same path will be found and loaded automatically. If dataset_file is a list, it represents for a list of dataset files to be read directly.

  • dataset_name (str, optional) – name of dataset, Default: “dataset_name”

  • constraint_type (str, optional) – constraint type of the specified dataset to get it’s corresponding loss function. Default: “Label”

  • shuffle (Union[bool, Shuffle level], optional) –

    Perform reshuffling of the data every epoch If shuffle is False, no shuffling will be performed. If shuffle is True, performs global shuffle. Default: True. Otherwise, there are two levels of shuffling:

    • Shuffle.GLOBAL: Shuffle both the files and sample.

    • Shuffle.FILES: Shuffle files only.

  • num_shards (int, optional) – Number of shards that the dataset will be divided into (default=None). When this argument is specified, ‘num_samples’ reflects the maximum sample number of per shard.

  • shard_id (int, optional) – The shard ID within num_shards (default=None). This argument can only be specified when num_shards is also specified.

  • sampler (Sampler, optional) – Object used to choose samples from the dataset (default=None, sampler is exclusive with shuffle and block_reader). Support list: SubsetRandomSampler, PkSampler, RandomSampler, SequentialSampler, DistributedSampler.

  • num_samples (int, optional) – The number of samples to be included in the dataset (default=None, all samples).

  • num_parallel_workers (int, optional) – The number of readers (default=None).

Raises
  • ValueError – If dataset_files are not valid or do not exist.

  • TypeError – If dataset_name is not string.

  • ValueError – If constraint)_type.lower() not in [“equation”, “bc”, “ic”, “label”, “function”, “custom”].

  • RuntimeError – If num_shards is specified but shard_id is None.

  • RuntimeError – If shard_id is specified but num_shards is None.

  • ValueError – If shard_id is invalid (< 0 or >= num_shards).

Supported Platforms:

Ascend GPU

Examples

>>> from mindflow.data import MindDataset
>>> dataset_files = ["./data_dir"] # contains 1 or multiple MindRecord files
>>> dataset = MindDataset(dataset_files=dataset_files)
create_dataset(batch_size=1, preprocess_fn=None, updated_columns_list=None, drop_remainder=True, prebatched_data=False, num_parallel_workers=1, python_multiprocessing=False)[source]

create the final mindspore type dataset.

Parameters
  • batch_size (int, optional) – An int number of rows each batch is created with. Default: 1.

  • preprocess_fn (Union[list[TensorOp], list[functions]], optional) – List of operations to be applied on the dataset. Operations are applied in the order they appear in this list. Default: None.

  • updated_columns_list (list, optional) – List of columns to be applied on the dataset. Default: None.

  • drop_remainder (bool, optional) – Determines whether or not to drop the last block whose data row number is less than batch size. If True, and if there are less than batch_size rows available to make the last batch, then those rows will be dropped and not propagated to the child node. Default: True.

  • prebatched_data (bool, optional) – Generate pre-batched data before data preprocessing. Default: False.

  • num_parallel_workers (int, optional) – Number of workers(threads) to process the dataset in parallel. Default: 1.

  • python_multiprocessing (bool, optional) – Parallelize Python function per_batch_map with multi-processing. This option could be beneficial if the function is computational heavy. Default: False.

Returns

BatchDataset, dataset batched.

Examples

>>> data = dataset.create_dataset()
get_columns_list()[source]

get columns list

Returns

list[str]. column names list of the final unified dataset.

Examples

>>> columns_list = dataset.get_columns_list()
set_constraint_type(constraint_type='Equation')[source]

set constraint type of dataset

Parameters

constraint_type (Union[str, dict]) – The constraint type of specified dataset. If is string, the constraint type of all subdataset will be set to the same one. If is dict, the subdataset and it’s constraint type is specified by the pair (key, value).

Examples

>>> dataset.set_constraint_type("Equation")
split_dataset(dataset_dict, constraint_dict=None)[source]

split the original dataset in order to set difference loss functions

Parameters
  • dataset_dict (dict) – dictionary of each sub-dataset, the key is the labeled name while the value refers to the specified columns contained in the sub-dataset.

  • constraint_dict (Union[None, str, dict]) – The constraint type of specified dataset. If None, “Label” will be set for all. If is string, all will be set to the same one. If is dict, the subdataset and it’s constraint type is specified by the pair (key, value). Default: None.

Examples

>>> dataset.split_dataset({"Equation" : "inner_points", "BC" : "bc_points"})