3 Powerful Python Libraries to (Partially) Automate EDA And Get You Started With Your Data Project | by Juan Jose Munoz | Dec, 2023

Editor
2 Min Read


All machine learning problems are data problems.

To avoid the old adage of “garbage in, garbage out,” it makes sense that you should spend considerable time understanding and cleaning your data. I recently read “The Kaggle Book” by Konrad Banachewicz & Luca Massaron, where they interview many Kaggle grandmasters. Interestingly, rushing or skipping the EDA is the most common mistake they and beginners make.

Photo by Choong Deng Xiang on Unsplash

We all know how important EDA is, and yet we still skip this step. It might be because it is hard to know where to start, what questions you should be asking, or maybe we are too eager to jump into modeling.

Here are 3 Python libraries you can use to partially automate your Exploratory Data Analysis and get you started with your data project.

The data for the below analysis is from Kaggle, House Prices — Advanced Regression Techniques competition.

This is the new version of Pandas profiling supported by Spark and now goes beyond just Pandas DataFrame.

The goal, however, remains the same: provide a one-line Exploratory Data Analysis (EDA) experience. This package highlights the importance of having an easy-to-implement data quality evaluation framework. This framework shouldn’t be limited to the initial phase of your project but rather implemented throughout the data project.

Ydata profiling can be run in two lines.

!pip install ydata-profiling
from ydata_profiling import ProfileReport

#Generate the data profile report
profile = ProfileReport(train,title='EDA')

#show the report on the notebook
profile.to_notebook_iframe()

Alerts indicating high correlation, class imbalances, missing data, etc… Image by author
Variables distribution. Image by author

The output shows the distribution of the variables and provides you with a set of alerts…

Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.