mindspore_gl.dataset.Alchemy
- class mindspore_gl.dataset.Alchemy(root, datasize=10000)[source]
Alchemy dataset, a source dataset for reading and parsing Alchemy dataset.
About Alchemy dataset: The Tencent Quantum Lab has recently introduced a new molecular dataset, called Alchemy, to facilitate the development of new machine learning models useful for chemistry and materials science.
The dataset lists 12 quantum mechanical properties of 130,000+ organic molecules comprising up to 12 heavy atoms (C, N, O, S, F and Cl), sampled from the GDBMedChem database. These properties have been calculated using the open-source computational chemistry program Python-based Simulation of Chemistry Framework (PySCF).
Statistics:
Graphs: 99776
Nodes: 9.71
Edges: 10.02
Number of quantum mechanical properties: 12
Dataset can be download here: Alchemy dev and Alchemy valid .
You can organize the dataset files into the following directory structure and read by preprocess API.
. ├── dev │ ├── dev_target.csv │ └── sdf │ ├── atom_10 │ ├── atom_11 │ ├── atom_12 │ └── atom_9 └── valid ├── sdf │ ├── atom_11 │ └── atom_12 └── valid_target.csv
- Parameters
- Raises
TypeError – if root is not a str.
RuntimeError – if root does not contain data files.
ValueError – if datasize is more than 99776.
Examples
>>> from mindspore_gl.dataset import Alchemy >>> root = "path/to/alchemy" >>> dataset = Alchemy(root)
- property edge_feat
Edge features.
- Returns
numpy.ndarray, array of edge feature.
Examples
>>> #dataset is an instance object of Dataset >>> node_feat = dataset.edge_feat
- property graph_count
Total graph numbers.
- Returns
int, numbers of graph.
Examples
>>> #dataset is an instance object of Dataset >>> graph_count = dataset.graph_count
- graph_edge_feat(graph_idx)[source]
graph edge features.
- Parameters
graph_idx (int) – index of graph.
- Returns
numpy.ndarray, edge feature of graph.
Examples
>>> #dataset is an instance object of Dataset >>> graph_edge_feat = dataset.graph_edge_feat(graph_idx)
- property graph_edges
Accumulative graph edges count.
- Returns
numpy.ndarray, array of accumulative edges.
Examples
>>> #dataset is an instance object of Dataset >>> val_mask = dataset.graph_edges
- graph_feat(graph_idx)[source]
graph node features.
- Parameters
graph_idx (int) – index of graph.
- Returns
numpy.ndarray, node feature of graph.
Examples
>>> #dataset is an instance object of Dataset >>> graph_feat = dataset.graph_feat(graph_idx)
- property graph_label
Graph label.
- Returns
numpy.ndarray, array of graph label.
Examples
>>> #dataset is an instance object of Dataset >>> node_feat = dataset.graph_label
- property graph_nodes
Accumulative graph nodes count
- Returns
numpy.ndarray, array of accumulative nodes.
Examples
>>> #dataset is an instance object of Dataset >>> val_mask = dataset.graph_nodes
- property n_tasks
Graph label size.
- Returns
int, size of graph label.
Examples
>>> #dataset is an instance object of Dataset >>> n_tasks = dataset.n_tasks
- property node_feat
Node features.
- Returns
numpy.ndarray, array of node feature.
Examples
>>> #dataset is an instance object of Dataset >>> node_feat = dataset.node_feat
- property num_edge_features
Number of label classes.
- Returns
int, the number of classes.
Examples
>>> #dataset is an instance object of Dataset >>> num_edge_features = dataset.num_edge_features
- property num_features
Feature size of each node.
- Returns
int, the number of feature size.
Examples
>>> #dataset is an instance object of Dataset >>> num_features = dataset.num_features
- property train_graphs
Train graph id
- Returns
numpy.ndarray, array of train graph id.
Examples
>>> #dataset is an instance object of Dataset >>> train_graphs = dataset.train_graphs
- property train_mask
Mask of training nodes.
- Returns
numpy.ndarray, array of mask.
Examples
>>> #dataset is an instance object of Dataset >>> train_mask = dataset.train_mask
- property val_graphs
Valid graph id
- Returns
numpy.ndarray, array of valid graph id.
Examples
>>> #dataset is an instance object of Dataset >>> val_graphs = dataset.val_graphs
- property val_mask
Mask of validation nodes.
- Returns
numpy.ndarray, array of mask.
Examples
>>> #dataset is an instance object of Dataset >>> val_mask = dataset.val_mask