mindspore_gl.dataset

Reading and building interface for graph datasets.

class mindspore_gl.dataset.CoraV2(root)[source]

Cora Dataset, a source dataset for reading and parsing Cora dataset.

Parameters: root (str) – path to the root directory that contains cora_v2_with_mask.npz.
Raises: RuntimeError – If root does not contain data files.

Examples

>>> from mindspore_gl.dataset import CoraV2
>>> root = "path/to/cora_v2_with_mask.npz"
>>> dataset = CoraV2(root)

About Cora dataset:

The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 10556 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.

Statistics:

Nodes: 2708
Edges: 10556
Number of Classes: 7
Label split:
- Train: 140
- Valid: 500
- Test: 1000

Dataset can be download here: <https://github.com/kimiyoung/planetoid> You can organize the dataset files into the following directory structure and read by process API.

.
└── corav2
    ├── ind.cora_v2.allx
    ├── ind.cora_v2.ally
    ├── ind.cora_v2.graph
    ├── ind.cora_v2.test.index
    ├── ind.cora_v2.tx
    ├── ind.cora_v2.ty
    ├── ind.cora_v2.x
    └── ind.cora_v2.y

property adj_coo

Return the adjacency matrix of COO representation

Returns: numpy.ndarray, array of coo matrix.

Examples

>>> #dataset is an instance object of Dataset
>>> node_label = dataset.adj_coo()

property adj_csr

Return the adjacency matrix of CSR representation.

Returns: numpy.ndarray, array of csr matrix.

Examples

>>> #dataset is an instance object of Dataset
>>> node_label = dataset.adj_csr()

property edge_count

Number of edges

Returns: int, length of csr col

Examples

>>> #dataset is an instance object of Dataset
>>> edge_count = dataset.edge_count()

load()[source]: Load the saved npz dataset from files.

property node_count

Number of nodes

Returns: int, length of csr row

Examples

>>> #dataset is an instance object of Dataset
>>> node_count = dataset.node_count()

property node_feat

Node features

Returns: numpy.ndarray, array of node feature

Examples

>>> #dataset is an instance object of Dataset
>>> node_feat = dataset.node_feat()

property node_label

Ground truth labels of each node

Returns: numpy.ndarray, array of node label

Examples

>>> #dataset is an instance object of Dataset
>>> node_label = dataset.node_label()

property num_classes

Number of label classes

Returns: int, the number of classes

Examples

>>> #dataset is an instance object of Dataset
>>> num_classes = dataset.num_classes()

property num_features

Feature size of each node

Returns: int, the number of feature size

Examples

>>> #dataset is an instance object of Dataset
>>> num_features = dataset.num_features()

preprocess()[source]: Download and process data

property test_mask

Mask of test nodes

Returns: numpy.ndarray, array of mask

Examples

>>> #dataset is an instance object of Dataset
>>> test_mask = dataset.test_mask()

property train_mask

Mask of training nodes

Returns: numpy.ndarray, array of mask

Examples

>>> #dataset is an instance object of Dataset
>>> train_mask = dataset.train_mask()

property train_nodes

training nodes indexes

Returns: numpy.ndarray, array of training nodes

Examples

>>> #dataset is an instance object of Dataset
>>> train_nodes = dataset.train_nodes()

property val_mask

Mask of validation nodes

Returns: numpy.ndarray, array of mask

Examples

>>> #dataset is an instance object of Dataset
>>> val_mask = dataset.val_mask()