17 – Real-World Python Projects – Image Downloader / Scraper

๐ŸŽฏ Project Objective

To build an automated Python Image Downloader that fetches and saves images from a website or Google Image search results โ€” useful for data collection, content management, and AI datasets.

Skills Youโ€™ll Learn:

  • Web scraping with requests & BeautifulSoup
  • Working with URLs and file systems
  • File I/O for saving images
  • Error handling and rate limiting
  • Automation and progress tracking

๐Ÿง  Project Overview

The Image Downloader App:

  • Takes a keyword or URL from the user
  • Finds and downloads all image files (.jpg, .png, .gif, etc.)
  • Saves them in a structured local folder
  • (Optional) Displays progress and handles duplicates

Real-Life Applications:

  • Collecting product or art images
  • Creating ML/AI image datasets
  • Automating wallpaper downloads
  • Archiving online photo galleries

โš™๏ธ Technology Stack

LibraryPurpose
requestsFetch HTML and image data
BeautifulSoupParse website content
osFile and directory handling
reRegular expressions for filtering URLs
tqdmProgress bar (optional, auto-installed)

๐Ÿ’ป Version 1 โ€” Console-Based Image Downloader

This script automatically installs missing dependencies, scrapes a URL, and saves images.

import os
import re
import sys
import subprocess

# โœ… Auto-install missing packages
def install(package):
    try:
        __import__(package)
    except ImportError:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])

install("requests")
install("beautifulsoup4")
install("tqdm")

import requests
from bs4 import BeautifulSoup
from tqdm import tqdm
from urllib.parse import urljoin

def download_images(url, folder="downloaded_images"):
    # Create folder if not exists
    os.makedirs(folder, exist_ok=True)
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    img_tags = soup.find_all("img")

    if not img_tags:
        print("โŒ No images found.")
        return

    print(f"๐Ÿ–ผ๏ธ Found {len(img_tags)} images. Downloading...")

    for img in tqdm(img_tags, desc="Downloading"):
        img_url = img.get("src")
        if not img_url:
            continue

        # Make absolute URL
        img_url = urljoin(url, img_url)
        img_name = os.path.basename(img_url.split("?")[0])

        # Only save image files
        if not re.search(r"\.(jpg|jpeg|png|gif)$", img_name, re.IGNORECASE):
            continue

        try:
            img_data = requests.get(img_url, timeout=10).content
            with open(os.path.join(folder, img_name), "wb") as f:
                f.write(img_data)
        except Exception as e:
            print(f"โš ๏ธ Skipped {img_url}: {e}")

    print(f"\nโœ… Download complete! Images saved in '{folder}' folder.")

# Example Usage
if __name__ == "__main__":
    target_url = input("Enter the website URL to scrape images from: ")
    download_images(target_url)

๐Ÿงพ Example Output

Enter the website URL to scrape images from: https://books.toscrape.com
๐Ÿ–ผ๏ธ Found 60 images. Downloading...
Downloading: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 60/60 [00:09<00:00, 6.52it/s]
โœ… Download complete! Images saved in 'downloaded_images'

๐Ÿงฉ Version 2 โ€” Search-Based Image Downloader

This one searches by keyword using Bing Image Search API (or you can adapt to Google).

import requests, os

API_KEY = "your_bing_api_key"
SEARCH_URL = "https://api.bing.microsoft.com/v7.0/images/search"

def search_images(keyword, count=10):
    headers = {"Ocp-Apim-Subscription-Key": API_KEY}
    params = {"q": keyword, "count": count}
    response = requests.get(SEARCH_URL, headers=headers, params=params)
    data = response.json()

    folder = f"images_{keyword.replace(' ', '_')}"
    os.makedirs(folder, exist_ok=True)

    for i, img in enumerate(data["value"]):
        try:
            img_url = img["contentUrl"]
            img_data = requests.get(img_url, timeout=10).content
            with open(os.path.join(folder, f"{keyword}_{i+1}.jpg"), "wb") as f:
                f.write(img_data)
        except Exception as e:
            print(f"โš ๏ธ Error downloading {img_url}: {e}")
    
    print(f"โœ… Downloaded {count} images for '{keyword}'")

search_images("sunsets", 15)

๐Ÿงฐ Optional Add-Ons

FeatureDescription
โœ… Duplicate FilteringCompare hashes to skip identical images
๐Ÿ•’ Delay/ThrottleAdd time.sleep() between requests
๐Ÿ“‚ Auto CategorizationSort images by keyword/topic
๐Ÿงฎ Progress BarUse tqdm for download visualization
๐Ÿง  AI IntegrationUse OpenAI or CLIP models to caption or tag images

๐ŸŒ Real-Life Automation Use-Cases

  • Building datasets for AI training (e.g., dogs, cars, food images)
  • Downloading product photos from e-commerce platforms
  • Backing up gallery or blog images
  • Generating visual datasets for research

๐Ÿง  Learning Outcomes

After completing this project, youโ€™ll:

  • Master website data extraction
  • Automate repetitive download tasks
  • Safely manage and structure large image datasets
  • Learn the ethics & legality of scraping (robots.txt compliance)

โš ๏ธ Ethical Scraping Tips

  • Always check a siteโ€™s robots.txt or terms of use before scraping.
  • Use headers to mimic browsers: headers = {"User-Agent": "Mozilla/5.0"} response = requests.get(url, headers=headers)
  • Avoid sending too many requests quickly โ€” respect server limits.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *