Big O — A Practical Approach. Using code examples to implement Big O… | by Christopher Karg

Using code examples to implement Big O best practices. How a mindset shift can boost your code performance at runtime

Source: https://www.pexels.com/photo/wooden-picnic-bench-beside-the-lake-9367323/

With this article I aim to increase the Big O literacy rate amongst us data professionals. We are often making comparisons between the world of Software Engineering (SWE) and the world of data. Whilst there are best-practices that are applied in both fields (such as version control, error handling, and testing), from my personal experience, the area where it seems data roles are lagging behind SWE roles is in writing performant code. More specifically, the mindset of implementing and checking for performant code. I personally have been guilty of this in the past — if the code in my script was running as expected and handling errors gracefully, I’d consider it ‘complete’. I believe anyone who is creating production-level code, regardless of job-title, is responsible for ensuring their code is performant at runtime.

I am aware there are many people in data focussed roles who already implement Big O best practices. From speaking with peers, these are largely individuals who work closely with software engineers and are therefore in a position to ‘absorb’ these methodologies. This is contrary to those who work in a siloed data team. Whilst I can only speculate as to why this best-practice has not been as readily adopted in the world of data, I believe a large part of it comes down to the paths we took to get to our current roles. Myself, as a ‘career-changer’ into the data field (I previously worked in the insurance industry), Big O and the idea of performant code was not covered in any of the curriculum I studied. Only when I started working as a professional Data Scientist in a very small data team, did I begin to realise the impact of writing performant code. Data professionals who run all their experiments in notebooks and rely on Software Engineers or ML-Ops professionals to get their code and models up and running in production will also be in a similar position as code optimisation does not necessarily fall under their remit (even though I really think it should).

Every line of code we push to production should serve a purpose, and usually this purpose is to run a process or some form of I/O operation. These processes cost money…