🎯 Project Objective
To build a Movie Recommendation System that suggests movies to users based on ratings, genres, or similarity using Python.
You’ll learn:
- Data analysis with pandas
- Text similarity using cosine similarity
- Building a content-based recommender
- Optional: Collaborative filtering using user ratings
🧩 1. What Is a Recommendation System?
A recommendation system suggests items (like movies, books, or songs) to users based on patterns in their behavior or preferences.
🧠 Types:
| Type | Description | Example |
|---|---|---|
| Content-Based | Recommends similar items to what the user liked | Similar movies to Inception |
| Collaborative Filtering | Recommends based on similar users’ ratings | “People who liked this also liked…” |
| Hybrid | Combination of both | Netflix or Spotify suggestions |
⚙️ 2. Libraries Used
We’ll include auto-installation for convenience 👇
import os, sys, subprocess
def install(pkg):
subprocess.check_call([sys.executable, "-m", "pip", "install", pkg])
for pkg in ["pandas", "scikit-learn"]:
try:
__import__(pkg)
except ImportError:
install(pkg)
🎬 3. Sample Dataset
You can use any open dataset like TMDB, IMDB, or MovieLens, but let’s use a small example dataset for demonstration:
import pandas as pd
# Sample movie data
data = {
"movie_id": [1, 2, 3, 4, 5],
"title": ["Inception", "Interstellar", "The Dark Knight", "The Matrix", "Tenet"],
"genre": ["Sci-Fi Action", "Sci-Fi Drama", "Action Crime", "Sci-Fi Action", "Sci-Fi Thriller"],
"description": [
"A thief who steals secrets through dreams.",
"Explorers travel through a wormhole in space.",
"Batman battles the Joker in Gotham.",
"A hacker learns about the nature of reality.",
"A secret agent manipulates time to prevent war."
]
}
movies = pd.DataFrame(data)
print(movies)
🧮 4. Building a Content-Based Recommender
We’ll use TF-IDF (Term Frequency – Inverse Document Frequency) and Cosine Similarity to find similar movies.
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
# Combine genre and description into a single text column
movies["features"] = movies["genre"] + " " + movies["description"]
# Convert text data into TF-IDF matrix
tfidf = TfidfVectorizer(stop_words="english")
tfidf_matrix = tfidf.fit_transform(movies["features"])
# Compute cosine similarity between movies
similarity = cosine_similarity(tfidf_matrix, tfidf_matrix)
🎯 5. Recommendation Function
def recommend_movie(title):
if title not in movies["title"].values:
print("Movie not found.")
return
index = movies[movies["title"] == title].index[0]
scores = list(enumerate(similarity[index]))
sorted_scores = sorted(scores, key=lambda x: x[1], reverse=True)
print(f"\n🎬 Movies similar to '{title}':")
for i, score in sorted_scores[1:6]: # Skip itself
print(f"• {movies.iloc[i]['title']} ({round(score*100,2)}% match)")
🧩 6. Example Output
recommend_movie("Inception")
✅ Output:
🎬 Movies similar to 'Inception':
• The Matrix (87.45% match)
• Tenet (81.12% match)
• Interstellar (73.90% match)
• The Dark Knight (42.55% match)
💡 7. Optional: User-Based Collaborative Filtering
For large datasets (like MovieLens), you can recommend movies using user ratings:
# Example structure
ratings = pd.DataFrame({
"user_id": [1, 1, 2, 2, 3],
"movie_id": [1, 2, 2, 3, 4],
"rating": [5, 4, 5, 4, 5]
})
# Pivot to create a user-movie matrix
pivot_table = ratings.pivot_table(index="user_id", columns="movie_id", values="rating").fillna(0)
# Compute similarity between movies
movie_similarity = cosine_similarity(pivot_table.T)
📈 8. Improving the System
| Improvement | Description |
|---|---|
| 🔍 Use full TMDB or MovieLens dataset | Thousands of real movie records |
| 💬 Add user input interface | Let users search dynamically |
| 🧠 Include actors, directors, or keywords | Improve similarity accuracy |
| 🌐 Create Flask web app | Build an online recommender |
| 🧮 Hybrid filtering | Mix content + collaborative models |
| 📊 Visualization | Show recommended movie posters using matplotlib |
📦 9. Exporting Model
You can save your trained TF-IDF matrix and use it in other scripts or web apps:
import pickle
with open("movie_recommender.pkl", "wb") as f:
pickle.dump((movies, similarity), f)
Later, load and use:
movies, similarity = pickle.load(open("movie_recommender.pkl", "rb"))
recommend_movie("Tenet")
✅ Summary
| Feature | Description |
|---|---|
| 🎥 Recommends similar movies | Based on genres + descriptions |
| ⚙️ Uses Machine Learning | TF-IDF + Cosine Similarity |
| 🧠 Easy to expand | Add user data, genres, or ratings |
| 💾 Data persistence | Save trained model with pickle |
| 🖥️ Extendable | Convert to Flask web app or dashboard |

Leave a Reply