mindspore_gl.dataset.Alchemy

View Source On Gitee
class mindspore_gl.dataset.Alchemy(root, datasize=10000)[source]

Alchemy dataset, a source dataset for reading and parsing Alchemy dataset.

About Alchemy dataset: The Tencent Quantum Lab has recently introduced a new molecular dataset, called Alchemy, to facilitate the development of new machine learning models useful for chemistry and materials science.

The dataset lists 12 quantum mechanical properties of 130,000+ organic molecules comprising up to 12 heavy atoms (C, N, O, S, F and Cl), sampled from the GDBMedChem database. These properties have been calculated using the open-source computational chemistry program Python-based Simulation of Chemistry Framework (PySCF).

Statistics:

  • Graphs: 99776

  • Nodes: 9.71

  • Edges: 10.02

  • Number of quantum mechanical properties: 12

Dataset can be download here: Alchemy dev and Alchemy valid .

You can organize the dataset files into the following directory structure and read.

.
├── dev
│ ├── dev_target.csv
│ └── sdf
│     ├── atom_10
│     ├── atom_11
│     ├── atom_12
│     └── atom_9
└── valid
    ├── sdf
    │ ├── atom_11
    │ └── atom_12
    └── valid_target.csv
Parameters
  • root (str) – path to the root directory that contains alchemy_with_mask.npz.

  • datasize (int, optional) – train data size. Default: 10000.

Raises

Examples

>>> from mindspore_gl.dataset import Alchemy
>>> root = "path/to/alchemy"
>>> dataset = Alchemy(root)
property edge_feat

Edge features.

Returns

  • numpy.ndarray, array of edge feature.

Examples

>>> #dataset is an instance object of Dataset
>>> node_feat = dataset.edge_feat
property edge_feat_size

Feature size of each edge.

Returns

  • int, the number of feature size.

Examples

>>> #dataset is an instance object of Dataset
>>> edge_feat_size = dataset.edge_feat_size
property graph_count

Total graph numbers.

Returns

  • int, numbers of graphs.

Examples

>>> #dataset is an instance object of Dataset
>>> graph_count = dataset.graph_count
graph_edge_feat(graph_idx)[source]

Graph edge features.

Parameters

graph_idx (int) – index of graph.

Returns

  • numpy.ndarray, edge feature of graph.

Examples

>>> #dataset is an instance object of Dataset
>>> graph_edge_feat = dataset.graph_edge_feat(graph_idx)
property graph_edges

Accumulative graph edges count.

Returns

  • numpy.ndarray, array of accumulative edges.

Examples

>>> #dataset is an instance object of Dataset
>>> graph_edges = dataset.graph_edges
property graph_label

Graph label.

Returns

  • numpy.ndarray, array of graph label.

Examples

>>> #dataset is an instance object of Dataset
>>> node_feat = dataset.graph_label
graph_node_feat(graph_idx)[source]

Graph node features.

Parameters

graph_idx (int) – index of graph.

Returns

  • numpy.ndarray, node feature of graph.

Examples

>>> #dataset is an instance object of Dataset
>>> graph_node_feat = dataset.graph_node_feat(graph_idx)
property graph_nodes

Accumulative graph nodes count.

Returns

  • numpy.ndarray, array of accumulative nodes.

Examples

>>> #dataset is an instance object of Dataset
>>> val_mask = dataset.graph_nodes
property node_feat

Node features.

Returns

  • numpy.ndarray, array of node feature.

Examples

>>> #dataset is an instance object of Dataset
>>> node_feat = dataset.node_feat
property node_feat_size

Feature size of each node.

Returns

  • int, the number of feature size.

Examples

>>> #dataset is an instance object of Dataset
>>> node_feat_size = dataset.node_feat_size
property num_classes

Graph label size.

Returns

  • int, size of graph label.

Examples

>>> #dataset is an instance object of Dataset
>>> num_classes = dataset.num_classes
property train_graphs

Train graph ID.

Returns

  • numpy.ndarray, array of train graph ID.

Examples

>>> #dataset is an instance object of Dataset
>>> train_graphs = dataset.train_graphs
property train_mask

Mask of training nodes.

Returns

  • numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> train_mask = dataset.train_mask
property val_graphs

Valid graph ID.

Returns

  • numpy.ndarray, array of valid graph ID.

Examples

>>> #dataset is an instance object of Dataset
>>> val_graphs = dataset.val_graphs
property val_mask

Mask of validation nodes.

Returns

  • numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> val_mask = dataset.val_mask