Does Bagging Help to Prevent Overfitting in Decision Trees? | by Gurjinder Kaur | Dec, 2023

Editor
2 Min Read


Understand why decision trees are highly prone to overfitting and its potential remedies

Photo by Jan Huber on Unsplash

Decision trees are a class of machine learning algorithms well known for their ability to solve both classification and regression problems, and not to forget the ease of interpretation they offer. However, they suffer from overfitting and can fail to generalize well if not controlled properly.

In this article, we will discuss what is overfitting, to what extent a decision tree overfits the training data, why it is an issue, and how it can be addressed.

Then, we will get ourselves acquainted with one of the ensemble techniques i.e., bagging, and see if it can be used to make decision trees more robust.

We will cover the following:

  • Create our regression dataset using NumPy.
  • Train a decision tree model using scikit-learn.
  • Understand what overfitting means by looking at the performance of the same model on the training set and test set.
  • Discuss why overfitting is more common in non-parametric models such as decision trees (and of course learn what is meant by the term non-parametric) and how it can be prevented using regularization.
  • Understand what bootstrap aggregation (bagging in short) is and how it can potentially help with overfitting.
  • Finally, we will implement the bagging version of the decision tree and see if it helps or not 🤞

Still wondering if it’s worth reading? 🤔 If you’ve ever wondered why Random Forests are usually preferred over vanilla Decision Trees, this is the best place to start since Random Forests use the idea of bagging plus something else to improve upon decision trees.

Let’s get started!

We will set up a Python notebook and import the libraries first.

import pandas as pd
import numpy as np
import plotly.graph_objects as go
from sklearn.tree import DecisionTreeRegressor
from sklearn import tree
from sklearn.model_selection import train_test_split
Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.