{"id":118,"date":"2025-10-25T07:31:44","date_gmt":"2025-10-25T07:31:44","guid":{"rendered":"https:\/\/codetypingpro.com\/?p=118"},"modified":"2025-10-25T07:31:44","modified_gmt":"2025-10-25T07:31:44","slug":"04-real-world-python-projects-web-scraper","status":"publish","type":"post","link":"https:\/\/codetypingpro.com\/?p=118","title":{"rendered":"04 &#8211; Real-World Python Projects &#8211; Web Scraper"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\">\ud83c\udfaf <strong>Project Objective<\/strong><\/h3>\n\n\n\n<p>To build a <strong>Web Scraper application<\/strong> that can automatically extract data from websites for analysis, monitoring, or reporting.<\/p>\n\n\n\n<p><strong>Skills Demonstrated:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Sending HTTP requests<\/li>\n\n\n\n<li>Parsing HTML and XML with BeautifulSoup<\/li>\n\n\n\n<li>Handling dynamic content with Selenium<\/li>\n\n\n\n<li>Storing scraped data in CSV or Excel<\/li>\n\n\n\n<li>Automating repetitive data collection tasks<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Project: Web Scraper App<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Project Description<\/strong><\/h3>\n\n\n\n<p>The Web Scraper app allows users to <strong>collect information<\/strong> from websites, such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Product prices from e-commerce sites<\/li>\n\n\n\n<li>News headlines or articles<\/li>\n\n\n\n<li>Job postings<\/li>\n\n\n\n<li>Stock prices or cryptocurrency rates<\/li>\n<\/ul>\n\n\n\n<p><strong>Real-Life Example:<\/strong> Scrape books from <a href=\"https:\/\/books.toscrape.com\/\">Books to Scrape<\/a> including <strong>title, price, and availability<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Python Example Code \u2013 Basic Scraper<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import requests\nfrom bs4 import BeautifulSoup\nimport pandas as pd\n\n# URL to scrape\nurl = \"https:\/\/books.toscrape.com\/\"\nresponse = requests.get(url)\n\n# Parse HTML\nsoup = BeautifulSoup(response.text, \"html.parser\")\n\n# Extract book titles, prices, and availability\nbooks = soup.find_all(\"h3\")\nprices = soup.find_all(\"p\", class_=\"price_color\")\navailability = soup.find_all(\"p\", class_=\"instock availability\")\n\ndata = &#91;]\nfor book, price, avail in zip(books, prices, availability):\n    data.append({\n        \"Title\": book.a&#91;\"title\"],\n        \"Price\": price.text,\n        \"Availability\": avail.text.strip()\n    })\n\n# Save data to CSV\ndf = pd.DataFrame(data)\ndf.to_csv(\"books.csv\", index=False)\nprint(\"Scraping completed. Data saved to books.csv\")\n<\/code><\/pre>\n\n\n\n<p>\u2705 <strong>Outputs:<\/strong> CSV file with book title, price, and availability.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Advanced Scraping \u2013 Pagination<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>base_url = \"https:\/\/books.toscrape.com\/catalogue\/page-{}.html\"\nall_books = &#91;]\n\nfor page in range(1, 6):  # First 5 pages\n    url = base_url.format(page)\n    response = requests.get(url)\n    soup = BeautifulSoup(response.text, \"html.parser\")\n    \n    books = soup.find_all(\"h3\")\n    prices = soup.find_all(\"p\", class_=\"price_color\")\n    \n    for book, price in zip(books, prices):\n        all_books.append({\"Title\": book.a&#91;\"title\"], \"Price\": price.text})\n\ndf = pd.DataFrame(all_books)\ndf.to_csv(\"books_paginated.csv\", index=False)\nprint(\"Paginated scraping completed.\")\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Scraping Dynamic Websites \u2013 Selenium Example<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>from selenium import webdriver\nfrom selenium.webdriver.common.by import By\n\ndriver = webdriver.Chrome()  # Ensure ChromeDriver is installed\ndriver.get(\"https:\/\/quotes.toscrape.com\/js\/\")\n\nquotes = driver.find_elements(By.CLASS_NAME, \"quote\")\ndata = &#91;]\nfor quote in quotes:\n    text = quote.find_element(By.CLASS_NAME, \"text\").text\n    author = quote.find_element(By.CLASS_NAME, \"author\").text\n    data.append({\"Quote\": text, \"Author\": author})\n\ndriver.quit()\n\nimport pandas as pd\ndf = pd.DataFrame(data)\ndf.to_csv(\"quotes_dynamic.csv\", index=False)\nprint(\"Dynamic scraping completed.\")\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>\u2705 Key Features<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Extract data from <strong>static and dynamic websites<\/strong><\/li>\n\n\n\n<li>Handle <strong>pagination<\/strong><\/li>\n\n\n\n<li>Store data in <strong>CSV or Excel<\/strong><\/li>\n\n\n\n<li>Automate repetitive scraping tasks<\/li>\n\n\n\n<li>Optional: Integrate with APIs for JSON scraping<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>\ud83c\udfaf Project Objective To build a Web Scraper application that can automatically extract data from websites for analysis, monitoring, or reporting. Skills Demonstrated: Project: Web Scraper App Project Description The Web Scraper app allows users to collect information from websites, such as: Real-Life Example: Scrape books from Books to Scrape including title, price, and availability. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-118","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/posts\/118","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=118"}],"version-history":[{"count":1,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/posts\/118\/revisions"}],"predecessor-version":[{"id":119,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/posts\/118\/revisions\/119"}],"wp:attachment":[{"href":"https:\/\/codetypingpro.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=118"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=118"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=118"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}