In this guide, we’ll explore how to re-create key Pandas functions used for EDA such as describe and corr in BigQuery
Transitioning from BigQuery/SQL to Python can be quite eye-opening, especially in the context of data analysis. I often find myself writing extensive queries to manipulate and analyze data in BigQuery SQL. It’s a powerful language, but it can get quite heavy.
Now, when I switched to Python, I was surprised by how streamlined certain tasks were. Python’s libraries, like pandas, allow you to perform data manipulations and analyses that would be cumbersome in SQL.
I found a few Pandas functions like DESCRIBE, CORR, and ISNULL().SUM() super useful, and wished they were in BigQuery. This got me exploring other cool EDA functions in pandas and inspired me to write this article. Here, I’m sharing the methods and code I came up with in BigQuery to match some of the best pandas EDA functions.
Let’s get stuck in!
In this article, we’ll take a look at these 13 functions:
- Head / Tail
- Columns
- Dtypes
- Nunique
- Unique
- ISNA / ISNULL()
- ISNULL().SUM()
- DropNA
- Shape
- Corr
- Nlargest
- Sample
- Describe
Throughout this article, we will play around with the popular mtcars dataset. The mtcars dataset is a publicly available built-in dataset in R. It comprises 11 features of 32 automobiles from the 1974 Motor Trend US magazine.
When you first look at a dataset, consider ‘Head’ and ‘Tail’ as the front and back pages…