17 – Real-World Python Projects – Image Downloader / Scraper

🎯 Project Objective

To build an automated Python Image Downloader that fetches and saves images from a website or Google Image search results — useful for data collection, content management, and AI datasets.

Skills You’ll Learn:

Web scraping with requests & BeautifulSoup
Working with URLs and file systems
File I/O for saving images
Error handling and rate limiting
Automation and progress tracking

🧠 Project Overview

The Image Downloader App:

Takes a keyword or URL from the user
Finds and downloads all image files (.jpg, .png, .gif, etc.)
Saves them in a structured local folder
(Optional) Displays progress and handles duplicates

Real-Life Applications:

Collecting product or art images
Creating ML/AI image datasets
Automating wallpaper downloads
Archiving online photo galleries

⚙️ Technology Stack

Library	Purpose
`requests`	Fetch HTML and image data
`BeautifulSoup`	Parse website content
`os`	File and directory handling
`re`	Regular expressions for filtering URLs
`tqdm`	Progress bar (optional, auto-installed)

💻 Version 1 — Console-Based Image Downloader

This script automatically installs missing dependencies, scrapes a URL, and saves images.

import os
import re
import sys
import subprocess

# ✅ Auto-install missing packages
def install(package):
    try:
        __import__(package)
    except ImportError:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])

install("requests")
install("beautifulsoup4")
install("tqdm")

import requests
from bs4 import BeautifulSoup
from tqdm import tqdm
from urllib.parse import urljoin

def download_images(url, folder="downloaded_images"):
    # Create folder if not exists
    os.makedirs(folder, exist_ok=True)
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    img_tags = soup.find_all("img")

    if not img_tags:
        print("❌ No images found.")
        return

    print(f"🖼️ Found {len(img_tags)} images. Downloading...")

    for img in tqdm(img_tags, desc="Downloading"):
        img_url = img.get("src")
        if not img_url:
            continue

        # Make absolute URL
        img_url = urljoin(url, img_url)
        img_name = os.path.basename(img_url.split("?")[0])

        # Only save image files
        if not re.search(r"\.(jpg|jpeg|png|gif)$", img_name, re.IGNORECASE):
            continue

        try:
            img_data = requests.get(img_url, timeout=10).content
            with open(os.path.join(folder, img_name), "wb") as f:
                f.write(img_data)
        except Exception as e:
            print(f"⚠️ Skipped {img_url}: {e}")

    print(f"\n✅ Download complete! Images saved in '{folder}' folder.")

# Example Usage
if __name__ == "__main__":
    target_url = input("Enter the website URL to scrape images from: ")
    download_images(target_url)

🧾 Example Output

Enter the website URL to scrape images from: https://books.toscrape.com
🖼️ Found 60 images. Downloading...
Downloading: 100%|██████████████████████| 60/60 [00:09<00:00, 6.52it/s]
✅ Download complete! Images saved in 'downloaded_images'

🧩 Version 2 — Search-Based Image Downloader

This one searches by keyword using Bing Image Search API (or you can adapt to Google).

import requests, os

API_KEY = "your_bing_api_key"
SEARCH_URL = "https://api.bing.microsoft.com/v7.0/images/search"

def search_images(keyword, count=10):
    headers = {"Ocp-Apim-Subscription-Key": API_KEY}
    params = {"q": keyword, "count": count}
    response = requests.get(SEARCH_URL, headers=headers, params=params)
    data = response.json()

    folder = f"images_{keyword.replace(' ', '_')}"
    os.makedirs(folder, exist_ok=True)

    for i, img in enumerate(data["value"]):
        try:
            img_url = img["contentUrl"]
            img_data = requests.get(img_url, timeout=10).content
            with open(os.path.join(folder, f"{keyword}_{i+1}.jpg"), "wb") as f:
                f.write(img_data)
        except Exception as e:
            print(f"⚠️ Error downloading {img_url}: {e}")
    
    print(f"✅ Downloaded {count} images for '{keyword}'")

search_images("sunsets", 15)

🧰 Optional Add-Ons

Feature	Description
✅ Duplicate Filtering	Compare hashes to skip identical images
🕒 Delay/Throttle	Add `time.sleep()` between requests
📂 Auto Categorization	Sort images by keyword/topic
🧮 Progress Bar	Use `tqdm` for download visualization
🧠 AI Integration	Use `OpenAI` or `CLIP` models to caption or tag images

🌐 Real-Life Automation Use-Cases

Building datasets for AI training (e.g., dogs, cars, food images)
Downloading product photos from e-commerce platforms
Backing up gallery or blog images
Generating visual datasets for research

🧠 Learning Outcomes

After completing this project, you’ll:

Master website data extraction
Automate repetitive download tasks
Safely manage and structure large image datasets
Learn the ethics & legality of scraping (robots.txt compliance)

⚠️ Ethical Scraping Tips

Always check a site’s robots.txt or terms of use before scraping.
Use headers to mimic browsers: headers = {"User-Agent": "Mozilla/5.0"} response = requests.get(url, headers=headers)
Avoid sending too many requests quickly — respect server limits.