Naive Bayes clearly explained with maths and scikit-learn | by Yoann Mocquin

Solving the iris dataset with a gaussian approach in scikit-learn.

10 min read

14 hours ago

In this post, we’ll delve into a particular kind of classifier called naive Bayes classifiers. These are methods that rely on Bayes’ theorem and the naive assumption that every pair of features is conditionally independent given a class label. If this doesn’t make sense to you, keep reading!

As a toy example, we’ll use the well-known iris dataset (CC BY 4.0 license) and a specific kind of naive Bayes classifier called Gaussian Naive Bayes classifier. Remember that the iris dataset is composed of 4 numerical features and the target can be any of 3 types of iris flower (setosa, versicolor, virginica).

We’ll decompose the method into the following steps:

Reviewing the Bayes theorem: this theorem provides the mathematical formula that allows us to estimate the probability that a given sample belongs to any class.
We can create a classifier, a tool that returns a predicted class for an input sample, by comparing the probability that this sample belongs to a class, for all classes.
Using the chain rule and the conditional independence hypothesis, we can simplify the probability formula.
Then to be able to compute the probabilities, we use another assumption: that the feature distributions are Gaussian.
Using a training set, we can estimate the parameters of those Gaussian distributions.
Finally, we have all the tools we need to predict a class for a new sample.

I have plenty of new posts like this one incoming; remember to subscribe!

Bayes’ theorem is a probability theorem that states the following:

P(A|B) is the conditional probability that A is true (or A happens) given (or knowing) that B is true (or B happened) — also called the posterior probability of A given B (posterior: we updated the probability that A is true after we know B is true).