Struggling with Data Science? 5 Common Beginner Mistakes

Contents

Not Learning Fundamental Maths Trying To Find The “Best” Course Not Doing Project-Based Learning Quantity Over Quality Projects Jumping Straight To AI Another Thing!Connect With Me

data science, first of all well done.

You’ve chosen one of the most lucrative and fast-growing careers in tech.

But here’s the truth: most students waste months (even years) spinning their wheels on the wrong things. Avoid these mistakes to fast track your data science career.

After 4+ years working in the field, I’ve seen exactly what separates those who land their first data science job fast… from those who never make it past endless tutorials.

In this article, I’ll break down the five biggest mistakes that hold beginner data scientists back so you can actively avoid them.

Not Learning Fundamental Maths

Maths is by far the most important… and yet also the most overlooked.

Many people, even practitioners, think that you don’t need to know the underlying maths behind data science and machine learning.

You are indeed very unlikely to carry out backpropagation by hand, build a decision tree from scratch, or construct an A/B experiment from first principles.

So, it is easy to take this for granted and avoid learning any of the background theory.

However, this is dangerous and I don’t recommend it.

Sure, you can build a neural network with a few lines of PyTorch, but what happens when it has weird behaviour and you need to debug it?

Or what if someone asked you what the prediction interval is around your output from a linear regression model?

These scenarios come up more frequently than you think, and the only way you can answer them is by having a solid grasp of the underpinning maths.

Think of maths as the operating system of your brain for data science. Every model, every algorithm, every insight you produce runs on it.

If your OS is buggy or outdated, nothing else runs smoothly, no matter how fancy your tools are.

Lay the foundations now while you are in the learning phase, as this will allow you to move much faster later in your career.

Trying To Find The “Best” Course

I often get asked:

What’s the best course?

I really do love you all, but this question needs to go away.

As a complete beginner, the best course is the one you choose and complete.

Many introductory courses in data science, machine learning, and Python will teach you the same things.

You may find a teacher or a teaching style better than another, but fundamentally, you will acquire very similar knowledge to another person doing some other course.

Bias towards action and getting going in the beginning, you can later adjust your direction if you feel you are misaligned. Stop overthinking.

As the famous saying goes:

The best time to plant a tree was 20 years ago. The second best time is today.

Everyone’s journey and background are different, and there is no “one way” to break into data science.

So, take everyone’s advice (even mine) always with a pinch of salt and tailor it to yourself. Do what feels right and best for you.

Not Doing Project-Based Learning

Along that theme, another common pitfall is tutorial hell.

Trust me, that’s not a place you want to be in.

If you are unaware of what tutorial hell is, this blog post explains it very well:

Tutorial hell is where you write code that others are explaining to you how to write, but you don’t understand how to write it yourself when given a blank slate. At some point, it’s time to take the training wheels off and build something on your own

You are basically following tutorial after tutorial and not attempting to build anything on your own.

To learn the concepts, you need to practice and apply them independently in your work. This is how you solidify your understanding, and the real learning is done.

Imagine that you have only ever built an XGBoost model following online tutorials.

If you are then given a takeaway case study as part of an interview, you are going to really struggle as you have had no experience building models without a step-by-step walkthrough.

What I advocate for is “project-based learning.”

You want to learn just enough, and then immediately build a project.

Trust me, this approach is exponentially better than doing numerous tutorials (speaking from painful experience here!).

Quantity Over Quality Projects

Whilst doing projects is the best way to learn, don’t oversaturate your GitHub with loads of “easy” projects.

If all your projects revolve around an already pre-made dataset from Kaggle and using sci-kit learn’s .fit() and .predict() methods, it’s probably time to try something a bit harder.

Now, I am not slating these entry-level projects, as they are a great way to get your hands dirty.

However, at some point, the quality of your projects will matter more than the quantity.

Larger, in-depth projects will be the ones that actually get you hired. Recruiters don’t want to see another titanic dataset problem; if anything, it would be a red flag nowadays.

Some ideas to try:

Build ML algorithms from scratch using native Python.
Re-implementing a research paper and trying to replicate the authors’ results.
Build a basic recommendation system for something personal in your life.
Fine-tune an LLM.

This is by no means an exhaustive list, and the best project is the one that is personal to you, as I always say.

Jumping Straight To AI

I am going to be honest with you.

I am an AI hater.

No, I don’t think it will replace data scientists.

No, I don’t think it is as good as people think.

And I’m as sure as hell am not worried about it at all for the next 5 years.

The reasons I am not worried could fill a whole video, so I will leave that for later. But it’s actually funny, almost how little I am concerned by it.

Anyway, the reason I say this is that it baffles me when I see beginners jump straight into learning AI and LLMs.

This is a prime example of shiny object syndrome.

As a beginner, focus on the basics of maths and statistics, and on old-school algorithms such as decision trees, regression models, and support vector machines.

These are evergreen and will remain around for a long time, so it’s wise to invest in them early on.

AI is still an unknown entity, and whether it will be as popular and helpful in a few years is hard to tell.

If the topic is popular now and indeed helpful, it will be popular 1 year, 3 years, and even a decade from now. So, don’t worry, you have plenty of time to study cutting-edge topics.

Remember what I said earlier about not all projects getting you hired?

That longer, more in-depth ones make all the difference?

But what do these projects actually look like?

Well, see my previous article, which walks through specific projects that help you stand out (and which ones are a total waste of time).

See you there!

Another Thing!

Join my free newsletter where I share weekly tips, insights, and advice on landing your first data science or machine learning job. Plus, as a subscriber, you’ll get my FREE Resume Template!

https://newsletter.egorhowell.com