Our idea is to capture the network information using an array of numbers which are called low-dimensional embeddings. There exist different algorithms specifically for the purpose of learning numerical representations for graph nodes.

DeepWalk is a node embedding technique that is based on the Random Walk concept which I will be using in this example. In order to implement it, I picked the Graph Embedding python library that provides 5 different types of algorithms to generate the embeddings.

Firstly, install the Graph Embedding library and run the setup:

!git clone https://github.com/shenweichen/GraphEmbedding.gitcd GraphEmbedding/!python setup.py install

We use the DeepWalk model to learn the embeddings for our graph nodes. The variable embeddings stores the embeddings in form of a dictionary where the keys are the nodes and values are the embeddings themselves.

As I mentioned before, embeddings are just low-dimensional numerical representations of the network, therefore we can make a visualization of these embeddings. Here, the size of the embeddings is 128, so we need to employ t-SNE which is a dimensionality reduction technique. Basically, t-SNE transforms the 128 dimension array into a 2-dimensional array so that we can visualize it in a 2D space.

Note: The embedding size is a hyperparameter.

The visualization made using the above code looks like this:

We can see that the embeddings generated for this graph are of good quality as there is a clear separation between the red and blue points. Now we can build a graph neural network model which trains on these embeddings and finally, we will have a good prediction model.

I will reuse the code from my previous post for building the graph neural network model for the node classification task. The procedure we follow from now is very similar to my previous post. We just change the node features from degree to DeepWalk embeddings.

Continue reading: https://towardsdatascience.com/a-beginners-guide-to-graph-neural-networks-using-pytorch-geometric-part-2-cd82c01330ab?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com