How to Test Graph Quality to Improve Graph Machine Learning Performance | by Eivind Kjosbakken

Testing the quality of your graphs is vital to ensure their performance in your machine learning system. This article will show you how to test the quality of your topological graphs

Graphs are data structures capable of representing a large amount of information. In addition to representing data samples individually as nodes, a graph also represents the relationship between the data, encapsulating more of the information stored in your dataset. When creating a graph, however, it is important to verify the quality of the graph, which is what I will discuss how you can do in this article.

Learn how to ensure graph quality with this article. Image by ChatGPT. “make a graph with some nodes being looked at by a magnifying glass” prompt. *ChatGPT*, 4, OpenAI, 25 Feb. 2024. https://chat.openai.com.

The motivation for this article is that I am creating graphs for a project I am working on. The graphs are later in my pipeline used to perform clustering as seen in the pipeline image below. To ensure the correctness of my graph, I want to have a test that can output the quality of each graph I create. When working on machine-learning projects, verifying your results and quality is vital for both saving time bug fixing and ensuring that your data pipeline is working correctly. The verification result can work as a sanity check, so you are sure the graph is not the issue if your machine-learning algorithm is not performing as expected.

The pipeline for my machine learning project. Image by the author.

Furthermore, I also want to reduce the scope of what I will be talking about. First of all, when referring to a graph, I mean a graph structure purely defined by its topological structure, meaning I am only referring to the relationship between the data. A graph purely defined by its topological structure, can then be represented with 2 lists. One list of all node indices, and one list of all edges (which could also include edge weights), a 2D list with each row (source, destination, weight). If your graph is weighted, you can ignore the weight, or set all weights to 1. Secondly, a scope definition I will make is that I am using my graph to separate different classes from each other, which will be reflected in…