28 – Real-World Python Projects – Movie Recommendation System

🎯 Project Objective

To build a Movie Recommendation System that suggests movies to users based on ratings, genres, or similarity using Python.

You’ll learn:

  • Data analysis with pandas
  • Text similarity using cosine similarity
  • Building a content-based recommender
  • Optional: Collaborative filtering using user ratings

🧩 1. What Is a Recommendation System?

A recommendation system suggests items (like movies, books, or songs) to users based on patterns in their behavior or preferences.

🧠 Types:

TypeDescriptionExample
Content-BasedRecommends similar items to what the user likedSimilar movies to Inception
Collaborative FilteringRecommends based on similar users’ ratings“People who liked this also liked…”
HybridCombination of bothNetflix or Spotify suggestions

⚙️ 2. Libraries Used

We’ll include auto-installation for convenience 👇

import os, sys, subprocess
def install(pkg):
    subprocess.check_call([sys.executable, "-m", "pip", "install", pkg])

for pkg in ["pandas", "scikit-learn"]:
    try:
        __import__(pkg)
    except ImportError:
        install(pkg)

🎬 3. Sample Dataset

You can use any open dataset like TMDB, IMDB, or MovieLens, but let’s use a small example dataset for demonstration:

import pandas as pd

# Sample movie data
data = {
    "movie_id": [1, 2, 3, 4, 5],
    "title": ["Inception", "Interstellar", "The Dark Knight", "The Matrix", "Tenet"],
    "genre": ["Sci-Fi Action", "Sci-Fi Drama", "Action Crime", "Sci-Fi Action", "Sci-Fi Thriller"],
    "description": [
        "A thief who steals secrets through dreams.",
        "Explorers travel through a wormhole in space.",
        "Batman battles the Joker in Gotham.",
        "A hacker learns about the nature of reality.",
        "A secret agent manipulates time to prevent war."
    ]
}

movies = pd.DataFrame(data)
print(movies)

🧮 4. Building a Content-Based Recommender

We’ll use TF-IDF (Term Frequency – Inverse Document Frequency) and Cosine Similarity to find similar movies.

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Combine genre and description into a single text column
movies["features"] = movies["genre"] + " " + movies["description"]

# Convert text data into TF-IDF matrix
tfidf = TfidfVectorizer(stop_words="english")
tfidf_matrix = tfidf.fit_transform(movies["features"])

# Compute cosine similarity between movies
similarity = cosine_similarity(tfidf_matrix, tfidf_matrix)

🎯 5. Recommendation Function

def recommend_movie(title):
    if title not in movies["title"].values:
        print("Movie not found.")
        return

    index = movies[movies["title"] == title].index[0]
    scores = list(enumerate(similarity[index]))
    sorted_scores = sorted(scores, key=lambda x: x[1], reverse=True)

    print(f"\n🎬 Movies similar to '{title}':")
    for i, score in sorted_scores[1:6]:  # Skip itself
        print(f"• {movies.iloc[i]['title']} ({round(score*100,2)}% match)")

🧩 6. Example Output

recommend_movie("Inception")

Output:

🎬 Movies similar to 'Inception':
• The Matrix (87.45% match)
• Tenet (81.12% match)
• Interstellar (73.90% match)
• The Dark Knight (42.55% match)

💡 7. Optional: User-Based Collaborative Filtering

For large datasets (like MovieLens), you can recommend movies using user ratings:

# Example structure
ratings = pd.DataFrame({
    "user_id": [1, 1, 2, 2, 3],
    "movie_id": [1, 2, 2, 3, 4],
    "rating": [5, 4, 5, 4, 5]
})

# Pivot to create a user-movie matrix
pivot_table = ratings.pivot_table(index="user_id", columns="movie_id", values="rating").fillna(0)

# Compute similarity between movies
movie_similarity = cosine_similarity(pivot_table.T)

📈 8. Improving the System

ImprovementDescription
🔍 Use full TMDB or MovieLens datasetThousands of real movie records
💬 Add user input interfaceLet users search dynamically
🧠 Include actors, directors, or keywordsImprove similarity accuracy
🌐 Create Flask web appBuild an online recommender
🧮 Hybrid filteringMix content + collaborative models
📊 VisualizationShow recommended movie posters using matplotlib

📦 9. Exporting Model

You can save your trained TF-IDF matrix and use it in other scripts or web apps:

import pickle
with open("movie_recommender.pkl", "wb") as f:
    pickle.dump((movies, similarity), f)

Later, load and use:

movies, similarity = pickle.load(open("movie_recommender.pkl", "rb"))
recommend_movie("Tenet")

Summary

FeatureDescription
🎥 Recommends similar moviesBased on genres + descriptions
⚙️ Uses Machine LearningTF-IDF + Cosine Similarity
🧠 Easy to expandAdd user data, genres, or ratings
💾 Data persistenceSave trained model with pickle
🖥️ ExtendableConvert to Flask web app or dashboard

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *