mindspore_gl.dataset.Alchemy

class mindspore_gl.dataset.Alchemy(root, datasize=10000)[source]

Alchemy dataset, a source dataset for reading and parsing Alchemy dataset.

About Alchemy dataset: The Tencent Quantum Lab has recently introduced a new molecular dataset, called Alchemy, to facilitate the development of new machine learning models useful for chemistry and materials science.

The dataset lists 12 quantum mechanical properties of 130,000+ organic molecules comprising up to 12 heavy atoms (C, N, O, S, F and Cl), sampled from the GDBMedChem database. These properties have been calculated using the open-source computational chemistry program Python-based Simulation of Chemistry Framework (PySCF).

Statistics:

  • Graphs: 99776

  • Nodes: 9.71

  • Edges: 10.02

  • Number of quantum mechanical properties: 12

Dataset can be download here: Alchemy dev and Alchemy valid .

You can organize the dataset files into the following directory structure and read by preprocess API.

.
├── dev
│ ├── dev_target.csv
│ └── sdf
│     ├── atom_10
│     ├── atom_11
│     ├── atom_12
│     └── atom_9
└── valid
    ├── sdf
    │ ├── atom_11
    │ └── atom_12
    └── valid_target.csv
Parameters
  • root (str) – path to the root directory that contains alchemy_with_mask.npz.

  • datasize (int) – train data size

Raises

Examples

>>> from mindspore_gl.dataset import Alchemy
>>> root = "path/to/alchemy"
>>> dataset = Alchemy(root)
property edge_feat

Edge features.

Returns

  • numpy.ndarray, array of edge feature.

Examples

>>> #dataset is an instance object of Dataset
>>> node_feat = dataset.edge_feat
property graph_count

Total graph numbers.

Returns

  • int, numbers of graph.

Examples

>>> #dataset is an instance object of Dataset
>>> graph_count = dataset.graph_count
graph_edge_feat(graph_idx)[source]

graph edge features.

Parameters

graph_idx (int) – index of graph.

Returns

  • numpy.ndarray, edge feature of graph.

Examples

>>> #dataset is an instance object of Dataset
>>> graph_edge_feat = dataset.graph_edge_feat(graph_idx)
property graph_edges

Accumulative graph edges count.

Returns

  • numpy.ndarray, array of accumulative edges.

Examples

>>> #dataset is an instance object of Dataset
>>> val_mask = dataset.graph_edges
graph_feat(graph_idx)[source]

graph node features.

Parameters

graph_idx (int) – index of graph.

Returns

  • numpy.ndarray, node feature of graph.

Examples

>>> #dataset is an instance object of Dataset
>>> graph_feat = dataset.graph_feat(graph_idx)
property graph_label

Graph label.

Returns

  • numpy.ndarray, array of graph label.

Examples

>>> #dataset is an instance object of Dataset
>>> node_feat = dataset.graph_label
property graph_nodes

Accumulative graph nodes count

Returns

  • numpy.ndarray, array of accumulative nodes.

Examples

>>> #dataset is an instance object of Dataset
>>> val_mask = dataset.graph_nodes
property n_tasks

Graph label size.

Returns

  • int, size of graph label.

Examples

>>> #dataset is an instance object of Dataset
>>> n_tasks = dataset.n_tasks
property node_feat

Node features.

Returns

  • numpy.ndarray, array of node feature.

Examples

>>> #dataset is an instance object of Dataset
>>> node_feat = dataset.node_feat
property num_edge_features

Number of label classes.

Returns

  • int, the number of classes.

Examples

>>> #dataset is an instance object of Dataset
>>> num_edge_features = dataset.num_edge_features
property num_features

Feature size of each node.

Returns

  • int, the number of feature size.

Examples

>>> #dataset is an instance object of Dataset
>>> num_features = dataset.num_features
property train_graphs

Train graph id

Returns

  • numpy.ndarray, array of train graph id.

Examples

>>> #dataset is an instance object of Dataset
>>> train_graphs = dataset.train_graphs
property train_mask

Mask of training nodes.

Returns

  • numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> train_mask = dataset.train_mask
property val_graphs

Valid graph id

Returns

  • numpy.ndarray, array of valid graph id.

Examples

>>> #dataset is an instance object of Dataset
>>> val_graphs = dataset.val_graphs
property val_mask

Mask of validation nodes.

Returns

  • numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> val_mask = dataset.val_mask