This blogpost provides a comprehensive study on theoretical and practical understanding of GraphSage which is an inductive graph representation learning algorithm. For a practical application we are going to use the popular PyTorch Geometric library and Open-Graph-Benchmark dataset. We use obgn-products dataset which is an undirected and unweighted graph, representing an Amazon product co-purchasing network to predict shopping preferences. Nodes represent products sold in Amazon, and edges between two products indicate that the products are purchased together. The goal is to predict the category of a product in a multi-class classification setup, where the 47 top-level categories are used for target labels making it a Node Classification Task.
So in brief here is the outline of the blog:
- What is GraphSage
- Importance of Neighbourhood Sampling
- Getting Hands-on Experience with GraphSage and PyTorch Geometric Library
- Open-Graph-Benchmark’s Amazon Product Recommendation Dataset
- Creating and Saving a model
- Generating Graph Embeddings Visualizations and Observations
Once the graph is created after incorporating meaningful relationships (edges) between all the entities (nodes) of the graph. The next question comes into mind is finding a way to integrate the information about graph structure (for e.g. information about the node’s global position in the graph or its local neighbourhood structure) into a machine learning model. One way to extract structural information from the graph is to compute its graph statistics using node degrees, clustering coefficients, kernel functions or hand-engineered features to estimate local neighbourhood structures. However, with these methods we can not perform an end-to-end learning i.e features cannot be learned with the help of loss function during the training process.
To tackle the above problem, representation learning approaches have been adopted to encode the structural information about the graphs into the euclidean space (vector/embedding space).
The key idea behind the graph representation learning is to learn a mapping function that embeds nodes, or entire (sub)graphs (from non-euclidean), as points in low-dimensional vector space (to embedding space). The aim is to optimize this mapping so that nodes which are nearby in the original network should also remain close to each other in the embedding space (vector space), while shoving unconnected nodes apart. Therefore by doing this, we can preserve the geometric…
Continue reading: https://towardsdatascience.com/a-comprehensive-case-study-of-graphsage-algorithm-with-hands-on-experience-using-pytorchgeometric-6fc631ab1067?source=rss—-7f60cf5620c9—4