Info & Theory
A naive Bayes classifier picks the class with the highest posterior probability given a point's features, using Bayes' rule and one simplifying assumption.
Bayes' rule
P(c | x) ∝ P(c) · P(x | c). The class
prior P(c) is the fraction of training points
in class c; the likelihood
P(x | c) says how well the point fits that class.
The "naive" assumption
Features are assumed conditionally independent given the
class, so the joint likelihood factorises:
P(x | c) = P(x₁ | c) · P(x₂ | c). It is rarely
exactly true, yet the classifier is fast and works
surprisingly well.
Gaussian model
For continuous features we fit a Gaussian per class per
axis: P(xⱼ | c) = 𝒩(xⱼ; μ_{cj}, σ²_{cj}). Each
class therefore has a mean and variance on each axis, estimated
from its points. The shaded ellipse marks one standard
deviation.
Decision boundary
The background colours every pixel by the argmax posterior class; the more confident the model, the stronger the shade. Because variances differ between classes the boundary is quadratic, not a straight line — that is the signature of Gaussian naive Bayes.