{"id":150,"date":"2025-10-27T16:12:03","date_gmt":"2025-10-27T16:12:03","guid":{"rendered":"https:\/\/codetypingpro.com\/?p=150"},"modified":"2025-10-27T16:12:03","modified_gmt":"2025-10-27T16:12:03","slug":"17-real-world-python-projects-image-downloader-scraper","status":"publish","type":"post","link":"https:\/\/codetypingpro.com\/?p=150","title":{"rendered":"17 &#8211; Real-World Python Projects &#8211; Image Downloader \/ Scraper"},"content":{"rendered":"\n<h6 class=\"wp-block-heading\"><\/h6>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83c\udfaf <strong>Project Objective<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To build an <strong>automated Python Image Downloader<\/strong> that fetches and saves images from a website or Google Image search results \u2014 useful for data collection, content management, and AI datasets.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Skills You\u2019ll Learn:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Web scraping with <code>requests<\/code> &amp; <code>BeautifulSoup<\/code><\/li>\n\n\n\n<li>Working with URLs and file systems<\/li>\n\n\n\n<li>File I\/O for saving images<\/li>\n\n\n\n<li>Error handling and rate limiting<\/li>\n\n\n\n<li>Automation and progress tracking<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83e\udde0 <strong>Project Overview<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The <strong>Image Downloader App<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Takes a <strong>keyword<\/strong> or <strong>URL<\/strong> from the user<\/li>\n\n\n\n<li>Finds and downloads all image files (<code>.jpg<\/code>, <code>.png<\/code>, <code>.gif<\/code>, etc.)<\/li>\n\n\n\n<li>Saves them in a structured local folder<\/li>\n\n\n\n<li>(Optional) Displays progress and handles duplicates<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Real-Life Applications:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Collecting product or art images<\/li>\n\n\n\n<li>Creating ML\/AI image datasets<\/li>\n\n\n\n<li>Automating wallpaper downloads<\/li>\n\n\n\n<li>Archiving online photo galleries<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\u2699\ufe0f <strong>Technology Stack<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Library<\/th><th>Purpose<\/th><\/tr><\/thead><tbody><tr><td><code>requests<\/code><\/td><td>Fetch HTML and image data<\/td><\/tr><tr><td><code>BeautifulSoup<\/code><\/td><td>Parse website content<\/td><\/tr><tr><td><code>os<\/code><\/td><td>File and directory handling<\/td><\/tr><tr><td><code>re<\/code><\/td><td>Regular expressions for filtering URLs<\/td><\/tr><tr><td><code>tqdm<\/code><\/td><td>Progress bar (optional, auto-installed)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udcbb <strong>Version 1 \u2014 Console-Based Image Downloader<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This script automatically installs missing dependencies, scrapes a URL, and saves images.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import os\nimport re\nimport sys\nimport subprocess\n\n# \u2705 Auto-install missing packages\ndef install(package):\n    try:\n        __import__(package)\n    except ImportError:\n        subprocess.check_call(&#91;sys.executable, \"-m\", \"pip\", \"install\", package])\n\ninstall(\"requests\")\ninstall(\"beautifulsoup4\")\ninstall(\"tqdm\")\n\nimport requests\nfrom bs4 import BeautifulSoup\nfrom tqdm import tqdm\nfrom urllib.parse import urljoin\n\ndef download_images(url, folder=\"downloaded_images\"):\n    # Create folder if not exists\n    os.makedirs(folder, exist_ok=True)\n    response = requests.get(url)\n    soup = BeautifulSoup(response.text, \"html.parser\")\n    img_tags = soup.find_all(\"img\")\n\n    if not img_tags:\n        print(\"\u274c No images found.\")\n        return\n\n    print(f\"\ud83d\uddbc\ufe0f Found {len(img_tags)} images. Downloading...\")\n\n    for img in tqdm(img_tags, desc=\"Downloading\"):\n        img_url = img.get(\"src\")\n        if not img_url:\n            continue\n\n        # Make absolute URL\n        img_url = urljoin(url, img_url)\n        img_name = os.path.basename(img_url.split(\"?\")&#91;0])\n\n        # Only save image files\n        if not re.search(r\"\\.(jpg|jpeg|png|gif)$\", img_name, re.IGNORECASE):\n            continue\n\n        try:\n            img_data = requests.get(img_url, timeout=10).content\n            with open(os.path.join(folder, img_name), \"wb\") as f:\n                f.write(img_data)\n        except Exception as e:\n            print(f\"\u26a0\ufe0f Skipped {img_url}: {e}\")\n\n    print(f\"\\n\u2705 Download complete! Images saved in '{folder}' folder.\")\n\n# Example Usage\nif __name__ == \"__main__\":\n    target_url = input(\"Enter the website URL to scrape images from: \")\n    download_images(target_url)\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83e\uddfe <strong>Example Output<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>Enter the website URL to scrape images from: https:\/\/books.toscrape.com\n\ud83d\uddbc\ufe0f Found 60 images. Downloading...\nDownloading: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 60\/60 &#91;00:09&lt;00:00, 6.52it\/s]\n\u2705 Download complete! Images saved in 'downloaded_images'\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83e\udde9 <strong>Version 2 \u2014 Search-Based Image Downloader<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This one searches by keyword using Bing Image Search API (or you can adapt to Google).<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import requests, os\n\nAPI_KEY = \"your_bing_api_key\"\nSEARCH_URL = \"https:\/\/api.bing.microsoft.com\/v7.0\/images\/search\"\n\ndef search_images(keyword, count=10):\n    headers = {\"Ocp-Apim-Subscription-Key\": API_KEY}\n    params = {\"q\": keyword, \"count\": count}\n    response = requests.get(SEARCH_URL, headers=headers, params=params)\n    data = response.json()\n\n    folder = f\"images_{keyword.replace(' ', '_')}\"\n    os.makedirs(folder, exist_ok=True)\n\n    for i, img in enumerate(data&#91;\"value\"]):\n        try:\n            img_url = img&#91;\"contentUrl\"]\n            img_data = requests.get(img_url, timeout=10).content\n            with open(os.path.join(folder, f\"{keyword}_{i+1}.jpg\"), \"wb\") as f:\n                f.write(img_data)\n        except Exception as e:\n            print(f\"\u26a0\ufe0f Error downloading {img_url}: {e}\")\n    \n    print(f\"\u2705 Downloaded {count} images for '{keyword}'\")\n\nsearch_images(\"sunsets\", 15)\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83e\uddf0 <strong>Optional Add-Ons<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Feature<\/th><th>Description<\/th><\/tr><\/thead><tbody><tr><td>\u2705 <strong>Duplicate Filtering<\/strong><\/td><td>Compare hashes to skip identical images<\/td><\/tr><tr><td>\ud83d\udd52 <strong>Delay\/Throttle<\/strong><\/td><td>Add <code>time.sleep()<\/code> between requests<\/td><\/tr><tr><td>\ud83d\udcc2 <strong>Auto Categorization<\/strong><\/td><td>Sort images by keyword\/topic<\/td><\/tr><tr><td>\ud83e\uddee <strong>Progress Bar<\/strong><\/td><td>Use <code>tqdm<\/code> for download visualization<\/td><\/tr><tr><td>\ud83e\udde0 <strong>AI Integration<\/strong><\/td><td>Use <code>OpenAI<\/code> or <code>CLIP<\/code> models to caption or tag images<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83c\udf10 <strong>Real-Life Automation Use-Cases<\/strong><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Building datasets for AI training (e.g., dogs, cars, food images)<\/li>\n\n\n\n<li>Downloading product photos from e-commerce platforms<\/li>\n\n\n\n<li>Backing up gallery or blog images<\/li>\n\n\n\n<li>Generating visual datasets for research<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83e\udde0 <strong>Learning Outcomes<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">After completing this project, you\u2019ll:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Master website data extraction<\/li>\n\n\n\n<li>Automate repetitive download tasks<\/li>\n\n\n\n<li>Safely manage and structure large image datasets<\/li>\n\n\n\n<li>Learn the ethics &amp; legality of scraping (robots.txt compliance)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\u26a0\ufe0f <strong>Ethical Scraping Tips<\/strong><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Always check a site\u2019s <strong>robots.txt<\/strong> or <strong>terms of use<\/strong> before scraping.<\/li>\n\n\n\n<li>Use <strong>headers<\/strong> to mimic browsers: <code>headers = {\"User-Agent\": \"Mozilla\/5.0\"} response = requests.get(url, headers=headers)<\/code><\/li>\n\n\n\n<li>Avoid sending too many requests quickly \u2014 respect server limits.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>\ud83c\udfaf Project Objective To build an automated Python Image Downloader that fetches and saves images from a website or Google Image search results \u2014 useful for data collection, content management, and AI datasets. Skills You\u2019ll Learn: \ud83e\udde0 Project Overview The Image Downloader App: Real-Life Applications: \u2699\ufe0f Technology Stack Library Purpose requests Fetch HTML and image [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-150","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/posts\/150","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=150"}],"version-history":[{"count":1,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/posts\/150\/revisions"}],"predecessor-version":[{"id":151,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/posts\/150\/revisions\/151"}],"wp:attachment":[{"href":"https:\/\/codetypingpro.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=150"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=150"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=150"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}