9.9 Naive Bayes Classifier

A good explanation of the theory is given at scikit-learn

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable. Bayes’ theorem states the following relationship, given class variable \(y\) and dependent feature vector \(x_1\) through \(x_n\)

\[P(y \mid x_1, \dots, x_n) = \frac{P(y) P(x_1, \dots, x_n \mid y)} {P(x_1, \dots, x_n)}\]

Using the naive conditional independence assumption that

\[P(x_i | y, x_1, \dots, x_{i-1}, x_{i+1}, \dots, x_n) = P(x_i | y)\] for all \(i\), this relationship is simplified to

\[P(y \mid x_1, \dots, x_n) = \frac{P(y) \prod_{i=1}^{n} P(x_i \mid y)} {P(x_1, \dots, x_n)}\] Since \(P(x_1, \dots, x_n)\) is constant given the input, we can use the following classification rule:

\[\begin{align}\begin{aligned}P(y \mid x_1, \dots, x_n) \propto P(y) \prod_{i=1}^{n} P(x_i \mid y)\\\Downarrow\\\hat{y} = \arg\max_y P(y) \prod_{i=1}^{n} P(x_i \mid y),\end{aligned}\end{align}\] and we can use Maximum A Posteriori (MAP) estimation to estimate \((P(y)\) and \(P(x_i \mid y)\); the former is then the relative frequency of class \(y\) in the training set. The different naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of \(P(x_i \mid y)\).

9.9.1 Gaussian Naive Bayes¶

For a Gaussian Naive Bayes an gaussian distrubution of the features is assumed \[P(x_i \mid y) = \frac{1}{\sqrt{2\pi\sigma^2_y}} \exp\left(-\frac{(x_i - \mu_y)^2}{2\sigma^2_y}\right)\]

The parameters \(\sigma_y\) and \(\mu_y\) are estimated using maximum likelihood

The Gaussian Naive Bayes is implemented in scikit-learn in GaussianNB.

The following example

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
gnb = GaussianNB()
y_pred = gnb.fit(X_train, y_train).predict(X_test)
print("Number of mislabeled points out of a total %d points : %d"
      % (X_test.shape[0], (y_test != y_pred).sum()))

gives the following output

Number of mislabeled points out of a total 75 points : 4

There are few more Naive Bayes algorithm as listed at scikit-learn

List of NaiveBayes Algorithms

  • Multinomial Naive Bayes¶
  • Complement Naive Bayes¶
  • Bernoulli Naive Bayes¶
  • Categorical Naive Bayes¶