In one of the first articles I wrote on Medium, I talked about using the apply() method on Pandas dataframes and said it should be avoided, if possible, on larger dataframes. I’ll put a link to that article at the end of this one if you want to check it out.
Although I talked then a bit about possible alternatives, i.e. using vectorisation, I didn’t give many examples of using vectorisation, so I intend to remedy that here. Specifically, I want to talk about how NumPy and a couple of its lesser-known methods ( whereand select) can be used to speed up Pandas operations that involve complex if/then/else conditions.
Vectorisation in the context of Pandas refers to the method of applying operations to entire blocks of data at once rather than iterating through them row by row or element by element. This approach is possible due to Pandas’ reliance on NumPy, which supports vectorised operations that are highly optimized and written in C, enabling faster processing. When you use vectorised operations in Pandas, such as applying arithmetic operations or functions to DataFrame or Series objects, the operations are dispatched to multiple data elements simultaneously.