## Dive into Naive Bayes Classifier utilizing Python

That is the third article on this sequence I’ve referred to as “ Ace your Machine Studying Interview” by which I am going over the foundations of Machine Studying. When you missed the primary two articles you will discover them right here :

**Introduction**

Naive Bayes is a Machine Studying algorithm used to resolve classification issues, and it’s so-called as a result of it’s primarily based on Bayes’ theorem.

An algorithm known as a classifier, assigns a category to every occasion of knowledge. For instance, classifying whether or not an electronic mail is spam or non-spam.

## Bayes Theorem

Bayes’ Theorem is used to calculate the chance of a trigger ensuing within the verified occasion. The method we have now all studied in chance programs is the next.

So this theorem solutions the query: *‘What’s the chance that occasion A will happen on condition that occasion B has occurred?**’ *And the attention-grabbing factor is that this method turns the query round. That’s, we will calculate this chance by going to see what number of occasions B truly occurred every time occasion A had occurred. That’s, **we will reply the unique query by going to see the previous (the information)**.

## Naive Bayes Classifier

However how then will we apply this theorem to create a Machine Studying classifier? Suppose we have now a dataset consisting of *n options* and a *goal*.

Due to this fact, our query now could be *‘What’s the chance of getting a sure label y on condition that these options occurred?’*

For instance if *y = spam/not-spam*, *x1 = len(electronic mail)*, *x2 = number_of_attachments* we’d ask :

*‘What’s the chance that y is spam on condition that x1 = 100 chars and x2 = 2 attachments?’*

To reply this query we want solely apply Bayes’ theorem trivially, the place A = {x1,x2,…,xn} and B = {y}.

However the classifier isn’t referred to as Bayes Classifier however Naive Bayes Classifier. It’s because a **naive assumption** is made to simplify the calculations, that’s, **the options are assumed to be unbiased of one another**. This enables us to simplify the method.

On this means, we will calculate the chance that *y = spam*. Subsequent, we are going to calculate the chance that *y = not_spam *and see which one is extra doubtless. But when you consider it, between the 2 labels, the one having greater chance would be the one with the bigger numerator because the denominator is all the time the identical : *P(x1) * P(x2)*…*

Then we will additionally eradicate for simplicity the denominator since for the aim of comparability we don’t care about it.

Now we’re going to **select the category that maximizes this chance**, we solely want to make use of **argmax**.

## Naive Bayes Classifier for Textual content Knowledge

This **algorithm is commonly used within the area of NLP for textual knowledge**. It’s because we will deal with particular person phrases that seem within the textual content as options, and the naive assumption is that subsequently these** phrases are unbiased** (which after all isn’t truly true).

Suppose we have now a dataset by which on every row we have now a single sentence, and **every column tells us whether or not or not that phrase seems within the sentence**. Now we have eradicated pointless phrases equivalent to articles, and many others.