mindspore_gl.dataset.CoraV2

View Source On Gitee
class mindspore_gl.dataset.CoraV2(root, name='cora_v2')[source]

Cora Dataset, a source dataset for reading and parsing Cora dataset.

About Cora dataset:

The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 10556 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.

Cora_v2 Statistics:

  • Nodes: 2708

  • Edges: 10556

  • Number of Classes: 7

  • Label split:

    • Train: 140

    • Valid: 500

    • Test: 1000

Dataset can be downloaded here:

cora_v2

citeseer

pubmed

You can organize the dataset files into the following directory structure and read.

.
└── corav2
    ├── ind.cora_v2.allx
    ├── ind.cora_v2.ally
    ├── ind.cora_v2.graph
    ├── ind.cora_v2.test.index
    ├── ind.cora_v2.tx
    ├── ind.cora_v2.ty
    ├── ind.cora_v2.x
    └── ind.cora_v2.y
Parameters
  • root (str) – path to the root directory that contains cora_v2_with_mask.npz.

  • name (str, optional) –

    select dataset type, support "cora_v2", "citeseer", "pubmed".

    • cora_v2: Machine learning papers.

    • citeseer: Agents, AI, DB, IR, ML and HCI papers.

    • pubmed: Scientific publications on diabetes.

Raises

RuntimeError – If root does not contain data files.

Examples

>>> from mindspore_gl.dataset import CoraV2
>>> root = "path/to/cora_v2_with_mask.npz"
>>> dataset = CoraV2(root)
property adj_coo

Return the adjacency matrix of COO representation.

Returns

  • numpy.ndarray, array of COO matrix.

Examples

>>> #dataset is an instance object of Dataset
>>> node_label = dataset.adj_coo
property adj_csr

Return the adjacency matrix of CSR representation.

Returns

  • numpy.ndarray, array of CSR matrix.

Examples

>>> #dataset is an instance object of Dataset
>>> node_label = dataset.adj_csr
property edge_count

Number of edges, length of csr col.

Returns

  • int, the number of edges.

Examples

>>> #dataset is an instance object of Dataset
>>> edge_count = dataset.edge_count
property node_count

Number of nodes, length of csr row.

Returns

  • int, the number of nodes.

Examples

>>> #dataset is an instance object of Dataset
>>> node_count = dataset.node_count
property node_feat

Node features.

Returns

  • numpy.ndarray, array of node feature.

Examples

>>> #dataset is an instance object of Dataset
>>> node_feat = dataset.node_feat
property node_feat_size

Feature size of each node.

Returns

  • int, the number of feature size.

Examples

>>> #dataset is an instance object of Dataset
>>> node_feat_size = dataset.node_feat_size
property node_label

Ground truth labels of each node.

Returns

  • numpy.ndarray, array of node label.

Examples

>>> #dataset is an instance object of Dataset
>>> node_label = dataset.node_label
property num_classes

Number of label classes.

Returns

  • int, the number of classes.

Examples

>>> #dataset is an instance object of Dataset
>>> num_classes = dataset.num_classes
property test_mask

Mask of test nodes.

Returns

  • numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> test_mask = dataset.test_mask
property train_mask

Mask of training nodes.

Returns

  • numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> train_mask = dataset.train_mask
property train_nodes

training nodes indexes.

Returns

  • numpy.ndarray, array of training nodes.

Examples

>>> #dataset is an instance object of Dataset
>>> train_nodes = dataset.train_nodes
property val_mask

Mask of validation nodes.

Returns

  • numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> val_mask = dataset.val_mask