Multinomial Naive Bayes
Text Classification and Document Analysis
Handling Discrete Count Data
In the previous articles, we've explored the foundations of Naive Bayes, its extension to continuous data with Gaussian Naive Bayes, and binary classification with Bernoulli Naive Bayes. We'll examine Multinomial Naive Bayes, a variant specifically designed for discrete count data, particularly text classification.
Why Multinomial Naive Bayes?
While Bernoulli Naive Bayes considers feature presence/absence and Gaussian handles continuous values, Multinomial Naive Bayes works with discrete count data. This makes it ideal for:
Text classification (word frequencies)
Document categorization
Sentiment analysis
Email spam detection
Topic modeling
The Mathematical Framework
The Multinomial Distribution
The core of this variant relies on the multinomial distribution, which models the probability of observing counts across multiple categories:
P(X|class) = (n!/(x₁!...xₖ!)) * ∏(pᵢ^xᵢ)Where:
X is a feature vector of counts
xᵢ is the count for feature i
pᵢ is the probability of feature i in the given class
n is the total count (∑xᵢ)
Text Classification Model
For document classification:
P(document|class) = P(class) * ∏P(word|class)^count(word)Let's build a sophisticated Multinomial Naive Bayes classifier:
import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.utils.validation import check_X_y, check_array
from scipy.sparse import issparse
class AdvancedMultinomialNB(BaseEstimator, ClassifierMixin):
def __init__(self, alpha=1.0):
"""
Initialize Multinomial Naive Bayes classifier
Args:
alpha (float): Smoothing parameter (Laplace smoothing)
"""
self.alpha = alpha
self.feature_log_prob_ = {}
self.class_log_prior_ = {}
def fit(self, X, y):
"""
Fit the Multinomial Naive Bayes model
Args:
X: Feature matrix (document-term matrix)
y: Target labels
"""
X, y = check_X_y(X, y, accept_sparse='csr')
self.classes_ = np.unique(y)
n_samples, n_features = X.shape
# Calculate class priors and feature probabilities
for c in self.classes_:
# Get samples for this class
X_c = X[y == c]
# Calculate class prior
n_c = X_c.shape[0]
self.class_log_prior_[c] = np.log((n_c + self.alpha) /
(n_samples + len(self.classes_) * self.alpha))
# Calculate feature probabilities
if issparse(X_c):
feature_counts = np.array(X_c.sum(axis=0))[0] + self.alpha
total_counts = feature_counts.sum()
else:
feature_counts = np.sum(X_c, axis=0) + self.alpha
total_counts = feature_counts.sum()
self.feature_log_prob_[c] = np.log(feature_counts / total_counts)
return self
def predict_proba(self, X):
"""Calculate class probabilities for X"""
X = check_array(X, accept_sparse='csr')
# Calculate log probabilities
log_probs = np.zeros((X.shape[0], len(self.classes_)))
for i, c in enumerate(self.classes_):
log_prob = self.class_log_prior_[c]
if issparse(X):
log_prob += X.dot(self.feature_log_prob_[c])
else:
log_prob += np.dot(X, self.feature_log_prob_[c])
log_probs[:, i] = log_prob
# Convert to probabilities
probs = np.exp(log_probs - np.max(log_probs, axis=1)[:, np.newaxis])
probs /= np.sum(probs, axis=1)[:, np.newaxis]
return probs
def predict(self, X):
"""Predict class labels for X"""
return self.classes_[np.argmax(self.predict_proba(X), axis=1)]Real-World Application: News Classification
Let's implement a news article classifier:
def preprocess_text(text):
"""Basic text preprocessing"""
import re
# Convert to lowercase
text = text.lower()
# Remove special characters
text = re.sub(r'[^\w\s]', '', text)
# Remove extra whitespace
text = ' '.join(text.split())
return text
class NewsClassifier:
def __init__(self, alpha=1.0):
self.vectorizer = CountVectorizer(
max_features=5000,
stop_words='english',
preprocessor=preprocess_text
)
self.classifier = AdvancedMultinomialNB(alpha=alpha)
def fit(self, texts, labels):
"""Train the classifier"""
# Transform texts to document-term matrix
X = self.vectorizer.fit_transform(texts)
# Train classifier
self.classifier.fit(X, labels)
return self
def predict(self, texts):
"""Predict categories for new texts"""
X = self.vectorizer.transform(texts)
return self.classifier.predict(X)
def predict_proba(self, texts):
"""Get probability estimates"""
X = self.vectorizer.transform(texts)
return self.classifier.predict_proba(X)
def explain_prediction(self, text):
"""Explain why a prediction was made"""
# Get feature names
features = self.vectorizer.get_feature_names_out()
# Transform text
X = self.vectorizer.transform([text])
# Get prediction
pred_class = self.predict([text])[0]
# Get feature importances
feature_probs = np.exp(self.classifier.feature_log_prob_[pred_class])
# Get non-zero features in the text
if issparse(X):
present_features = X.tocoo()
feature_indices = present_features.col
feature_counts = present_features.data
else:
feature_indices = np.where(X[0] > 0)[0]
feature_counts = X[0][feature_indices]
# Sort by importance
importance = feature_probs[feature_indices] * feature_counts
sorted_idx = np.argsort(importance)[::-1]
print(f"Predicted class: {pred_class}")
print("\nTop contributing words:")
for idx in sorted_idx[:10]:
word = features[feature_indices[idx]]
prob = feature_probs[feature_indices[idx]]
count = feature_counts[idx]
print(f"{word}: probability={prob:.3f}, count={count}")
def demonstrate_news_classification():
# Sample news articles
articles = [
"The stock market reached record highs today as tech companies reported strong earnings",
"Scientists discover new species of deep-sea creatures in Pacific Ocean exploration",
"Local team wins championship in dramatic overtime victory",
"New cryptocurrency regulations proposed by government agencies",
"Breakthrough in renewable energy storage announced by researchers"
]
categories = [
"business",
"science",
"sports",
"business",
"science"
]
# Create and train classifier
classifier = NewsClassifier(alpha=1.0)
classifier.fit(articles, categories)
# Test with new article
new_article = """
Tech giants announced revolutionary AI models that could transform
the industry, causing their stocks to surge in after-hours trading.
"""
# Get and explain prediction
classifier.explain_prediction(new_article)Multinomial Naive Bayes proves exceptionally well-suited for text classification tasks. By effectively handling discrete count data, it leverages word frequencies to make accurate predictions. Our implementation demonstrates its efficiency in processing sparse document-term matrices, its intuitive probability calculations, and its ability to provide explainable predictions through feature importance analysis.
The news classification example highlights the practical application of Multinomial Naive Bayes in real-world scenarios. Its ability to not only classify but also explain the reasoning behind its decisions makes it a valuable tool, especially in domains where understanding the decision-making process is critical.

