BigQuery Methods For Re-Creating Pandas’ Top EDA Functions | by Tom Ellyatt | Feb, 2024

Editor
2 Min Read


In this guide, we’ll explore how to re-create key Pandas functions used for EDA such as describe and corr in BigQuery

Image created using DALL-E

Transitioning from BigQuery/SQL to Python can be quite eye-opening, especially in the context of data analysis. I often find myself writing extensive queries to manipulate and analyze data in BigQuery SQL. It’s a powerful language, but it can get quite heavy.

Now, when I switched to Python, I was surprised by how streamlined certain tasks were. Python’s libraries, like pandas, allow you to perform data manipulations and analyses that would be cumbersome in SQL.

I found a few Pandas functions like DESCRIBE, CORR, and ISNULL().SUM() super useful, and wished they were in BigQuery. This got me exploring other cool EDA functions in pandas and inspired me to write this article. Here, I’m sharing the methods and code I came up with in BigQuery to match some of the best pandas EDA functions.

Let’s get stuck in!

In this article, we’ll take a look at these 13 functions:

  1. Head / Tail
  2. Columns
  3. Dtypes
  4. Nunique
  5. Unique
  6. ISNA / ISNULL()
  7. ISNULL().SUM()
  8. DropNA
  9. Shape
  10. Corr
  11. Nlargest
  12. Sample
  13. Describe

Throughout this article, we will play around with the popular mtcars dataset. The mtcars dataset is a publicly available built-in dataset in R. It comprises 11 features of 32 automobiles from the 1974 Motor Trend US magazine.

My image, screenshot taken from R Studio
Panda Icon Source — Flaticon (link)

When you first look at a dataset, consider ‘Head’ and ‘Tail’ as the front and back pages…

Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.