mindspore_gl.dataset.Reddit
- class mindspore_gl.dataset.Reddit(root)[source]
Reddit Dataset, a source dataset for reading and parsing Reddit dataset.
About Reddit dataset:
The node label in this case is the community, or “subreddit”, that a post belongs to. The authors sampled 50 large communities and built a post-to-post graph, connecting posts if the same user comments on both. In total this dataset contains 232,965 posts with an average degree of 492. We use the first 20 days for training and the remaining days for testing (with 30% used for validation).
Statistics:
Nodes: 232,965
Edges: 114,615,892
Number of classes: 41
Dataset can be download here: Reddit .
You can organize the dataset files into the following directory structure and read by preprocess API.
. ├── reddit_data.npz └── reddit_graph.npz
- Parameters
root (str) – path to the root directory that contains reddit_with_mask.npz
- Raises
TypeError – if root is not a str.
RuntimeError – if root does not contain data files.
Examples
>>> from mindspore_gl.dataset import Reddit >>> root = "path/to/reddit" >>> dataset = Reddit(root)
- property edge_count
Number of edges.
- Returns
int, length of csr col.
Examples
>>> #dataset is an instance object of Dataset >>> edge_count = dataset.edge_count
- property node_count
Number of nodes.
- Returns
int, length of csr row.
Examples
>>> #dataset is an instance object of Dataset >>> node_count = dataset.node_count
- property node_feat
Node features.
- Returns
numpy.ndarray, array of node feature.
Examples
>>> #dataset is an instance object of Dataset >>> node_feat = dataset.node_feat
- property node_label
Ground truth labels of each node.
- Returns
numpy.ndarray, array of node label.
Examples
>>> #dataset is an instance object of Dataset >>> node_label = dataset.node_label
- property num_classes
Number of label classes.
- Returns
int, the number of classes.
Examples
>>> #dataset is an instance object of Dataset >>> num_classes = dataset.num_classes
- property num_features
Feature size of each node.
- Returns
int, the number of feature size.
Examples
>>> #dataset is an instance object of Dataset >>> num_features = dataset.num_features
- property test_mask
Mask of test nodes.
- Returns
numpy.ndarray, array of mask.
Examples
>>> #dataset is an instance object of Dataset >>> test_mask = dataset.test_mask
- property test_nodes
Test nodes indexes.
- Returns
numpy.ndarray, array of test nodes.
Examples
>>> #dataset is an instance object of Dataset >>> test_nodes = dataset.test_nodes
- property train_mask
Mask of training nodes.
- Returns
numpy.ndarray, array of mask.
Examples
>>> #dataset is an instance object of Dataset >>> train_mask = dataset.train_mask
- property train_nodes
training nodes indexes.
- Returns
numpy.ndarray, array of training nodes.
Examples
>>> #dataset is an instance object of Dataset >>> train_nodes = dataset.train_nodes
- property val_mask
Mask of validation nodes.
- Returns
numpy.ndarray, array of mask.
Examples
>>> #dataset is an instance object of Dataset >>> val_mask = dataset.val_mask
- property val_nodes
Val nodes indexes.
- Returns
numpy.ndarray, array of val nodes.
Examples
>>> #dataset is an instance object of Dataset >>> val_nodes = dataset.val_nodes