๐ฏ Project Objective
To build a Web Scraper application that can automatically extract data from websites for analysis, monitoring, or reporting.
Skills Demonstrated:
- Sending HTTP requests
- Parsing HTML and XML with BeautifulSoup
- Handling dynamic content with Selenium
- Storing scraped data in CSV or Excel
- Automating repetitive data collection tasks
Project: Web Scraper App
Project Description
The Web Scraper app allows users to collect information from websites, such as:
- Product prices from e-commerce sites
- News headlines or articles
- Job postings
- Stock prices or cryptocurrency rates
Real-Life Example: Scrape books from Books to Scrape including title, price, and availability.
Python Example Code โ Basic Scraper
import requests
from bs4 import BeautifulSoup
import pandas as pd
# URL to scrape
url = "https://books.toscrape.com/"
response = requests.get(url)
# Parse HTML
soup = BeautifulSoup(response.text, "html.parser")
# Extract book titles, prices, and availability
books = soup.find_all("h3")
prices = soup.find_all("p", class_="price_color")
availability = soup.find_all("p", class_="instock availability")
data = []
for book, price, avail in zip(books, prices, availability):
data.append({
"Title": book.a["title"],
"Price": price.text,
"Availability": avail.text.strip()
})
# Save data to CSV
df = pd.DataFrame(data)
df.to_csv("books.csv", index=False)
print("Scraping completed. Data saved to books.csv")
โ Outputs: CSV file with book title, price, and availability.
Advanced Scraping โ Pagination
base_url = "https://books.toscrape.com/catalogue/page-{}.html"
all_books = []
for page in range(1, 6): # First 5 pages
url = base_url.format(page)
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
books = soup.find_all("h3")
prices = soup.find_all("p", class_="price_color")
for book, price in zip(books, prices):
all_books.append({"Title": book.a["title"], "Price": price.text})
df = pd.DataFrame(all_books)
df.to_csv("books_paginated.csv", index=False)
print("Paginated scraping completed.")
Scraping Dynamic Websites โ Selenium Example
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome() # Ensure ChromeDriver is installed
driver.get("https://quotes.toscrape.com/js/")
quotes = driver.find_elements(By.CLASS_NAME, "quote")
data = []
for quote in quotes:
text = quote.find_element(By.CLASS_NAME, "text").text
author = quote.find_element(By.CLASS_NAME, "author").text
data.append({"Quote": text, "Author": author})
driver.quit()
import pandas as pd
df = pd.DataFrame(data)
df.to_csv("quotes_dynamic.csv", index=False)
print("Dynamic scraping completed.")
โ Key Features
- Extract data from static and dynamic websites
- Handle pagination
- Store data in CSV or Excel
- Automate repetitive scraping tasks
- Optional: Integrate with APIs for JSON scraping

Leave a Reply