23 – Real-World Python Projects – PDF Toolkit

🎯 Project Objective

To build a multi-functional PDF Toolkit using Python that can merge, split, rotate, extract text, and add watermarks to PDF files — similar to tools like SmallPDF or iLovePDF.

🧩 1. Overview

PDFs are one of the most used document formats in the world.
A PDF Toolkit automates tasks like combining reports, splitting pages, extracting text, and adding watermarks — saving time and effort.

💼 Real-World Uses

Combine invoices or reports into one file
Extract text for analysis
Add company logos or “Confidential” watermarks
Rotate or reorder scanned pages

⚙️ 2. Required Modules

We’ll use PyPDF2 for handling PDFs, and auto-install it if missing.

# Auto-install required modules
try:
    from PyPDF2 import PdfReader, PdfWriter
except ModuleNotFoundError:
    import subprocess
    subprocess.check_call(["pip", "install", "PyPDF2"])
    from PyPDF2 import PdfReader, PdfWriter

🧠 3. Merge Multiple PDFs

def merge_pdfs(pdf_list, output_file):
    writer = PdfWriter()
    for pdf in pdf_list:
        reader = PdfReader(pdf)
        for page in reader.pages:
            writer.add_page(page)
    with open(output_file, "wb") as f:
        writer.write(f)
    print(f"✅ Merged {len(pdf_list)} PDFs into '{output_file}'")

🧪 Example:

merge_pdfs(["report1.pdf", "report2.pdf", "report3.pdf"], "merged_report.pdf")

✂️ 4. Split PDF into Individual Pages

def split_pdf(input_file):
    reader = PdfReader(input_file)
    for i, page in enumerate(reader.pages):
        writer = PdfWriter()
        writer.add_page(page)
        output_filename = f"page_{i+1}.pdf"
        with open(output_filename, "wb") as f:
            writer.write(f)
    print(f"✅ Split '{input_file}' into {len(reader.pages)} pages.")

🧪 Example:

split_pdf("merged_report.pdf")

🔁 5. Rotate Pages

def rotate_pdf(input_file, output_file, rotation_angle=90):
    reader = PdfReader(input_file)
    writer = PdfWriter()

    for page in reader.pages:
        page.rotate(rotation_angle)
        writer.add_page(page)

    with open(output_file, "wb") as f:
        writer.write(f)
    print(f"✅ Rotated all pages in '{input_file}' by {rotation_angle}°")

🧪 Example:

rotate_pdf("page_1.pdf", "rotated_page.pdf", 180)

💧 6. Add Watermark to Each Page

def add_watermark(input_file, watermark_file, output_file):
    reader = PdfReader(input_file)
    writer = PdfWriter()
    watermark = PdfReader(watermark_file).pages[0]

    for page in reader.pages:
        page.merge_page(watermark)
        writer.add_page(page)

    with open(output_file, "wb") as f:
        writer.write(f)
    print(f"✅ Added watermark to '{input_file}' and saved as '{output_file}'")

🧪 Example:

add_watermark("merged_report.pdf", "watermark.pdf", "watermarked_output.pdf")

📜 7. Extract Text from PDF

def extract_text(input_file):
    reader = PdfReader(input_file)
    all_text = ""
    for page in reader.pages:
        all_text += page.extract_text() + "\n"
    with open("extracted_text.txt", "w", encoding="utf-8") as f:
        f.write(all_text)
    print("✅ Text extracted and saved as 'extracted_text.txt'")

🧪 Example:

extract_text("report.pdf")

🧰 8. Interactive Menu System

def main():
    print("=== PDF Toolkit ===")
    print("1. Merge PDFs")
    print("2. Split PDF")
    print("3. Rotate PDF")
    print("4. Add Watermark")
    print("5. Extract Text")
    print("6. Exit")

    choice = input("Enter your choice: ")

    if choice == "1":
        files = input("Enter PDF filenames (comma separated): ").split(",")
        output = input("Output file name: ")
        merge_pdfs([f.strip() for f in files], output)
    elif choice == "2":
        file = input("Enter PDF filename to split: ")
        split_pdf(file)
    elif choice == "3":
        file = input("Enter PDF filename: ")
        angle = int(input("Rotation angle (90/180/270): "))
        output = input("Output file name: ")
        rotate_pdf(file, output, angle)
    elif choice == "4":
        file = input("Enter PDF filename: ")
        watermark = input("Enter watermark PDF filename: ")
        output = input("Output file name: ")
        add_watermark(file, watermark, output)
    elif choice == "5":
        file = input("Enter PDF filename: ")
        extract_text(file)
    elif choice == "6":
        print("Goodbye 👋")
    else:
        print("Invalid choice.")

if __name__ == "__main__":
    main()

💡 9. Enhancement Ideas

Feature	Description
🖼 GUI Toolkit	Add a Tkinter-based file selector
🔐 PDF Security	Add password protection or encryption
📑 Metadata	Display or edit PDF metadata (title, author)
📁 Batch Mode	Process entire folders automatically
☁️ Cloud Upload	Save output directly to Google Drive or Dropbox

✅ Summary

Feature	Function
📎 Merge	Combine multiple PDFs
✂️ Split	Separate pages into files
🔁 Rotate	Rotate PDF pages
💧 Watermark	Add watermark/logo
📜 Extract	Extract text from PDFs
🧰 Extend	GUI or encryption features possible

23 – Real-World Python Projects – PDF Toolkit

🎯 Project Objective

🧩 1. Overview

💼 Real-World Uses

⚙️ 2. Required Modules

🧠 3. Merge Multiple PDFs

🧪 Example:

✂️ 4. Split PDF into Individual Pages

🧪 Example:

🔁 5. Rotate Pages

🧪 Example:

💧 6. Add Watermark to Each Page

🧪 Example:

📜 7. Extract Text from PDF

🧪 Example:

🧰 8. Interactive Menu System

💡 9. Enhancement Ideas

✅ Summary

Comments

Leave a Reply Cancel reply