In order to formulate the problem, we need:
- The graph itself and the labels for each node
- The edge data in the Coordinate Format (COO)
- Embeddings or numerical representations for the nodes
Note: For the numerical representation for nodes, we can use graph properties like degree or use different embedding generation methods like node2vec, DeepWalk etc. In this example, I will be using node degree as its numerical representation.
Let’s get into the coding part.
The karate club dataset can be loaded directly from the NetworkX library. We retrieve the labels from the graph and create an edge index in the coordinate format. The node degree was used as embeddings/ numerical representations for the nodes (In the case of a directed graph, in-degree can be used for the same purpose). Since degree values tend to be diverse, we normalize them before using the values as input to the GNN model.
With this, we have prepared all the necessary parts to construct the Pytorch Geometric custom dataset.
The KarateDataset class inherits from the InMemoryDataset class and use a Data object to collate all information relating to the karate club dataset. The graph data is then split into train and test sets, thereby creating the train and test masks using the splits.
The data object contains the following variables:
Data(edge_index=[2, 156], num_classes=, test_mask=, train_mask=, x=[34, 1], y=)
This custom dataset can now be used with several graph neural network models from the Pytorch Geometric library. Let’s pick a Graph Convolutional Network model and use it to predict the missing labels on the test set.
Note: PyG library focuses more on node classification task but it can also be used for link prediction.
The GCN model is built with 2 hidden layers and each hidden layer contains 16 neurons. Let’s train the model!
Initial experiments with random hyperparameters gave these results:
Train Accuracy: 0.913
Test Accuracy: 0.727
This is not impressive and we can certainly do better. In my next post, I will discuss how we can use Optuna (python library on hyperparameter tuning) to tune the hyperparameters easily and find the best model. The code used in this example was taken from the PyTorch Geometric’s GitHub repository with some modifications (link).
To summarize everything we have done so far:
- Generate numerical representations for each node in the graph.
- Construct a PyG custom dataset and split data into train and test.
- Use a GNN model like GCN and train the model.
Continue reading: https://towardsdatascience.com/a-beginners-guide-to-graph-neural-networks-using-pytorch-geometric-part-1-d98dc93e7742?source=rss—-7f60cf5620c9—4