Exploratory Data Analysis (EDA) plays a crucial role in data science, which allows us to gain insights and understand the patterns within a dataset. In one of my previous articles, I introduced the convenience of a Python library called “Pandas GUI” which is an out-of-the-box Python EDA tool.
Now, let’s turn our attention to “ydata-profiling,” a successor to the popular “pandas-profiling” library. “ydata-profiling” offers advanced EDA capabilities and addresses the limitations of its predecessor, making it an invaluable resource for data scientists and analysts.
As always, before we can start to use the library, we need to install it using pip.
pip install ydata-profiling
To conduct EDA, we need to have a dataset. Let’s use one of the most famous public datasets — the Iris dataset for this demo. You can get it from the Sci-kit Learn library. However, to make it easier, since we are not going to use the Sci-kit Learn library in this demo, I found the dataset on the datahub.io website which you can make use of directly.
https://datahub.io/machine-learning/iris/r/iris.csv
We can easily load the data from the URL into Pandas dataframe as follows.
import pandas as pddf = pd.read_csv("https://datahub.io/machine-learning/iris/r/iris.csv")
df.head()
Then, we can import the ProfileReport module from the ydata-profiling library to generate the EDA report from the pandas dataframe.
from ydata_profiling…