Four Graph-based Feature Engineering Ideas to Improve your ML models | by Claudia Ng | Mar, 2024

Editor
2 Min Read


Explore innovative graph-based feature engineering techniques using networkx in Python and uncover hidden insights in tabular data

11 min read

21 hours ago

Want to level up the performance of your Machine Learning models? Consider spending more time on feature engineering.

Many data types in the real-world are relationships between different entities, but these relationships are hard to capture in tabular data form. In this article, we will walk through four graph-based feature engineering ideas for your ML models.

The examples in this article will primarily use networkx to engineer graph-based features, so if you’d like to follow along, be sure to install that with pip install networkx in your virtual environment. Let’s dive in!

Some examples of data types where graph-based features could be helpful include:

  • Social networks: features to capture relationships between accounts and to detect communities of accounts;
  • Recommendation systems: features to capture interactions between users and items;
  • Financial fraud: features to capture transactions between users and merchants;
  • Traffic prediction: features to capture road connectivity and congestion levels;
  • Medical: features to capture interactions between humans to predict disease outbreak.

Let’s dig deeper on one of these examples and talk about a common prediction problem that many social media companies face — recommendations on accounts to follow.

Companies in the social network space are commonly faced with link prediction problems. The goal is to predict whether an edge is likely to exist between accounts, and recommending accounts that users are likely to follow.

As an example, let’s pretend that Instagram stores information on account following in two tables (disclaimer: this is just an educated guess on what the data could look like and is not a reflection on how tables are structured at Instagram):

  • An accounts table with columns: `account_id, username, signup_timestamp,… `.
Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.