43 – Real-World Python Projects – Email Classifier

This project builds a complete AI-based Email Classifier, similar to Gmail’s auto-spam and category sorting.

It classifies incoming emails as:

✔ Spam / Not Spam

✔ Work / Personal

✔ Clients / Team / Promotions

✔ Urgent / Low Priority

You can expand the project to support multi-label classification and even auto-replies.


🧠 What This Project Teaches You

✔ Data preprocessing
✔ NLP text cleaning
✔ Vectorization (TF-IDF or embeddings)
✔ Machine learning models (SVM, Naïve Bayes, Logistic Regression)
✔ Building an email pipeline
✔ Loading live Gmail inbox via IMAP (optional)
✔ Training + testing classifier
✔ Saving as .pkl model


📁 Folder Structure

EmailClassifier/
│── train.py
│── classify.py
│── dataset.csv
│── model.pkl
│── vectorizer.pkl
│── requirements.txt

📦 requirements.txt

pandas
scikit-learn
nltk
joblib

Install:

pip install -r requirements.txt

📊 Example Dataset (dataset.csv)

textlabel
“You won a free iPhone. Click here!”spam
“Meeting scheduled at 3 PM.”work
“Your Amazon order has been shipped.”promotions

🧩 1. Training Script (train.py)

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
import joblib

# Load dataset
df = pd.read_csv("dataset.csv")

# Build model pipeline
pipeline = Pipeline([
    ("tfidf", TfidfVectorizer(stop_words='english')),
    ("model", MultinomialNB())
])

# Train model
pipeline.fit(df["text"], df["label"])

# Save model
joblib.dump(pipeline, "model.pkl")
print("Model saved as model.pkl")

🧩 2. Email Classification Script (classify.py)

import joblib

pipeline = joblib.load("model.pkl")

def classify_email(text):
    prediction = pipeline.predict([text])[0]
    return prediction

if __name__ == "__main__":
    msg = input("Paste email text: ")
    print("\nPredicted Category:", classify_email(msg))

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *