This project builds a complete AI-based Email Classifier, similar to Gmail’s auto-spam and category sorting.
It classifies incoming emails as:
✔ Spam / Not Spam
✔ Work / Personal
✔ Clients / Team / Promotions
✔ Urgent / Low Priority
You can expand the project to support multi-label classification and even auto-replies.
🧠 What This Project Teaches You
✔ Data preprocessing
✔ NLP text cleaning
✔ Vectorization (TF-IDF or embeddings)
✔ Machine learning models (SVM, Naïve Bayes, Logistic Regression)
✔ Building an email pipeline
✔ Loading live Gmail inbox via IMAP (optional)
✔ Training + testing classifier
✔ Saving as .pkl model
📁 Folder Structure
EmailClassifier/
│── train.py
│── classify.py
│── dataset.csv
│── model.pkl
│── vectorizer.pkl
│── requirements.txt
📦 requirements.txt
pandas
scikit-learn
nltk
joblib
Install:
pip install -r requirements.txt
📊 Example Dataset (dataset.csv)
| text | label |
|---|---|
| “You won a free iPhone. Click here!” | spam |
| “Meeting scheduled at 3 PM.” | work |
| “Your Amazon order has been shipped.” | promotions |
🧩 1. Training Script (train.py)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
import joblib
# Load dataset
df = pd.read_csv("dataset.csv")
# Build model pipeline
pipeline = Pipeline([
("tfidf", TfidfVectorizer(stop_words='english')),
("model", MultinomialNB())
])
# Train model
pipeline.fit(df["text"], df["label"])
# Save model
joblib.dump(pipeline, "model.pkl")
print("Model saved as model.pkl")
🧩 2. Email Classification Script (classify.py)
import joblib
pipeline = joblib.load("model.pkl")
def classify_email(text):
prediction = pipeline.predict([text])[0]
return prediction
if __name__ == "__main__":
msg = input("Paste email text: ")
print("\nPredicted Category:", classify_email(msg))

Leave a Reply