5 Examples to Master PySpark Window Operations | by Soner Yıldırım | Jan, 2024

Editor
2 Min Read


A must-know tool for data analysis

Photo by Pierre Châtel-Innocenti on Unsplash

All of the data analysis and manipulation tools I’ve worked with have window operations. Some are more flexible and capable than others but it is a must to be able to do calculations over a window.

What is a window in data analysis?

Window is a set of rows that are related in some ways. This relation can be of belonging to the same group or being in the n consecutive days. Once we generate the window with the required constraints, we can do calculations or aggregations over it.

In this article, we will go over 5 detailed examples to have a comprehensive understanding of window operations with PySpark. We’ll learn to create windows with partitions, customize these windows, and how to do calculations over them.

PySpark is a Python API for Spark, which is an analytics engine used for large-scale data processing.

I prepared a sample dataset with mock data for this article, which you can download from my datasets repository. The dataset we’ll use in this article is called “sample_sales_pyspark.csv”.

Let’s start a spark session and create a DataFrame from this dataset.

from pyspark.sql import SparkSession
from pyspark.sql import Window, functions as F

spark = SparkSession.builder.getOrCreate()

data = spark.read.csv("sample_sales_pyspark.csv", header=True)

data.show(15)

# output
+----------+------------+----------+---------+---------+-----+
|store_code|product_code|sales_date|sales_qty|sales_rev|price|
+----------+------------+----------+---------+---------+-----+
| B1| 89912|2021-05-01| 14| 17654| 1261|
| B1| 89912|2021-05-02| 19| 24282| 1278|
| B1| 89912|2021-05-03| 15| 19305| 1287|
| B1| 89912|2021-05-04| 21| 28287| 1347|
| B1| 89912|2021-05-05| 4| 5404| 1351|
| B1| 89912|2021-05-06| 5| 6775| 1355|
| B1| 89912|2021-05-07| 10| 12420| 1242|
| B1| 89912|2021-05-08| 18| 22500| 1250|
| B1| 89912|2021-05-09| 5| 6555| 1311|
| B1| 89912|2021-05-10| 2| 2638| 1319|…

Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.