28 – Real-World Python Projects – Movie Recommendation System

🎯 Project Objective

To build a Movie Recommendation System that suggests movies to users based on ratings, genres, or similarity using Python.

You’ll learn:

Data analysis with pandas
Text similarity using cosine similarity
Building a content-based recommender
Optional: Collaborative filtering using user ratings

🧩 1. What Is a Recommendation System?

A recommendation system suggests items (like movies, books, or songs) to users based on patterns in their behavior or preferences.

🧠 Types:

Type	Description	Example
Content-Based	Recommends similar items to what the user liked	Similar movies to Inception
Collaborative Filtering	Recommends based on similar users’ ratings	“People who liked this also liked…”
Hybrid	Combination of both	Netflix or Spotify suggestions

⚙️ 2. Libraries Used

We’ll include auto-installation for convenience 👇

import os, sys, subprocess
def install(pkg):
    subprocess.check_call([sys.executable, "-m", "pip", "install", pkg])

for pkg in ["pandas", "scikit-learn"]:
    try:
        __import__(pkg)
    except ImportError:
        install(pkg)

🎬 3. Sample Dataset

You can use any open dataset like TMDB, IMDB, or MovieLens, but let’s use a small example dataset for demonstration:

import pandas as pd

# Sample movie data
data = {
    "movie_id": [1, 2, 3, 4, 5],
    "title": ["Inception", "Interstellar", "The Dark Knight", "The Matrix", "Tenet"],
    "genre": ["Sci-Fi Action", "Sci-Fi Drama", "Action Crime", "Sci-Fi Action", "Sci-Fi Thriller"],
    "description": [
        "A thief who steals secrets through dreams.",
        "Explorers travel through a wormhole in space.",
        "Batman battles the Joker in Gotham.",
        "A hacker learns about the nature of reality.",
        "A secret agent manipulates time to prevent war."
    ]
}

movies = pd.DataFrame(data)
print(movies)

🧮 4. Building a Content-Based Recommender

We’ll use TF-IDF (Term Frequency – Inverse Document Frequency) and Cosine Similarity to find similar movies.

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Combine genre and description into a single text column
movies["features"] = movies["genre"] + " " + movies["description"]

# Convert text data into TF-IDF matrix
tfidf = TfidfVectorizer(stop_words="english")
tfidf_matrix = tfidf.fit_transform(movies["features"])

# Compute cosine similarity between movies
similarity = cosine_similarity(tfidf_matrix, tfidf_matrix)

🎯 5. Recommendation Function

def recommend_movie(title):
    if title not in movies["title"].values:
        print("Movie not found.")
        return

    index = movies[movies["title"] == title].index[0]
    scores = list(enumerate(similarity[index]))
    sorted_scores = sorted(scores, key=lambda x: x[1], reverse=True)

    print(f"\n🎬 Movies similar to '{title}':")
    for i, score in sorted_scores[1:6]:  # Skip itself
        print(f"• {movies.iloc[i]['title']} ({round(score*100,2)}% match)")

🧩 6. Example Output

recommend_movie("Inception")

✅ Output:

🎬 Movies similar to 'Inception':
• The Matrix (87.45% match)
• Tenet (81.12% match)
• Interstellar (73.90% match)
• The Dark Knight (42.55% match)

💡 7. Optional: User-Based Collaborative Filtering

For large datasets (like MovieLens), you can recommend movies using user ratings:

# Example structure
ratings = pd.DataFrame({
    "user_id": [1, 1, 2, 2, 3],
    "movie_id": [1, 2, 2, 3, 4],
    "rating": [5, 4, 5, 4, 5]
})

# Pivot to create a user-movie matrix
pivot_table = ratings.pivot_table(index="user_id", columns="movie_id", values="rating").fillna(0)

# Compute similarity between movies
movie_similarity = cosine_similarity(pivot_table.T)

📈 8. Improving the System

Improvement	Description
🔍 Use full TMDB or MovieLens dataset	Thousands of real movie records
💬 Add user input interface	Let users search dynamically
🧠 Include actors, directors, or keywords	Improve similarity accuracy
🌐 Create Flask web app	Build an online recommender
🧮 Hybrid filtering	Mix content + collaborative models
📊 Visualization	Show recommended movie posters using `matplotlib`

📦 9. Exporting Model

You can save your trained TF-IDF matrix and use it in other scripts or web apps:

import pickle
with open("movie_recommender.pkl", "wb") as f:
    pickle.dump((movies, similarity), f)

Later, load and use:

movies, similarity = pickle.load(open("movie_recommender.pkl", "rb"))
recommend_movie("Tenet")

✅ Summary

Feature	Description
🎥 Recommends similar movies	Based on genres + descriptions
⚙️ Uses Machine Learning	TF-IDF + Cosine Similarity
🧠 Easy to expand	Add user data, genres, or ratings
💾 Data persistence	Save trained model with pickle
🖥️ Extendable	Convert to Flask web app or dashboard