mindspore_gl.dataset.Reddit

class mindspore_gl.dataset.Reddit(root)[source]

Reddit Dataset, a source dataset for reading and parsing Reddit dataset.

About Reddit dataset:

The node label in this case is the community, or “subreddit”, that a post belongs to. The authors sampled 50 large communities and built a post-to-post graph, connecting posts if the same user comments on both. In total this dataset contains 232,965 posts with an average degree of 492. We use the first 20 days for training and the remaining days for testing (with 30% used for validation).

Statistics:

  • Nodes: 232,965

  • Edges: 114,615,892

  • Number of classes: 41

Dataset can be download here: Reddit .

You can organize the dataset files into the following directory structure and read by preprocess API.

.
├── reddit_data.npz
└── reddit_graph.npz
Parameters

root (str) – path to the root directory that contains reddit_with_mask.npz

Raises

Examples

>>> from mindspore_gl.dataset import Reddit
>>> root = "path/to/reddit"
>>> dataset = Reddit(root)
property edge_count

Number of edges.

Returns

  • int, length of csr col.

Examples

>>> #dataset is an instance object of Dataset
>>> edge_count = dataset.edge_count
property node_count

Number of nodes.

Returns

  • int, length of csr row.

Examples

>>> #dataset is an instance object of Dataset
>>> node_count = dataset.node_count
property node_feat

Node features.

Returns

  • numpy.ndarray, array of node feature.

Examples

>>> #dataset is an instance object of Dataset
>>> node_feat = dataset.node_feat
property node_label

Ground truth labels of each node.

Returns

  • numpy.ndarray, array of node label.

Examples

>>> #dataset is an instance object of Dataset
>>> node_label = dataset.node_label
property num_classes

Number of label classes.

Returns

  • int, the number of classes.

Examples

>>> #dataset is an instance object of Dataset
>>> num_classes = dataset.num_classes
property num_features

Feature size of each node.

Returns

  • int, the number of feature size.

Examples

>>> #dataset is an instance object of Dataset
>>> num_features = dataset.num_features
property test_mask

Mask of test nodes.

Returns

  • numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> test_mask = dataset.test_mask
property test_nodes

Test nodes indexes.

Returns

  • numpy.ndarray, array of test nodes.

Examples

>>> #dataset is an instance object of Dataset
>>> test_nodes = dataset.test_nodes
property train_mask

Mask of training nodes.

Returns

  • numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> train_mask = dataset.train_mask
property train_nodes

training nodes indexes.

Returns

  • numpy.ndarray, array of training nodes.

Examples

>>> #dataset is an instance object of Dataset
>>> train_nodes = dataset.train_nodes
property val_mask

Mask of validation nodes.

Returns

  • numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> val_mask = dataset.val_mask
property val_nodes

Val nodes indexes.

Returns

  • numpy.ndarray, array of val nodes.

Examples

>>> #dataset is an instance object of Dataset
>>> val_nodes = dataset.val_nodes