23 – Real-World Python Projects – PDF Toolkit

🎯 Project Objective

To build a multi-functional PDF Toolkit using Python that can merge, split, rotate, extract text, and add watermarks to PDF files β€” similar to tools like SmallPDF or iLovePDF.


🧩 1. Overview

PDFs are one of the most used document formats in the world.
A PDF Toolkit automates tasks like combining reports, splitting pages, extracting text, and adding watermarks β€” saving time and effort.

πŸ’Ό Real-World Uses

  • Combine invoices or reports into one file
  • Extract text for analysis
  • Add company logos or β€œConfidential” watermarks
  • Rotate or reorder scanned pages

βš™οΈ 2. Required Modules

We’ll use PyPDF2 for handling PDFs, and auto-install it if missing.

# Auto-install required modules
try:
    from PyPDF2 import PdfReader, PdfWriter
except ModuleNotFoundError:
    import subprocess
    subprocess.check_call(["pip", "install", "PyPDF2"])
    from PyPDF2 import PdfReader, PdfWriter

🧠 3. Merge Multiple PDFs

def merge_pdfs(pdf_list, output_file):
    writer = PdfWriter()
    for pdf in pdf_list:
        reader = PdfReader(pdf)
        for page in reader.pages:
            writer.add_page(page)
    with open(output_file, "wb") as f:
        writer.write(f)
    print(f"βœ… Merged {len(pdf_list)} PDFs into '{output_file}'")

πŸ§ͺ Example:

merge_pdfs(["report1.pdf", "report2.pdf", "report3.pdf"], "merged_report.pdf")

βœ‚οΈ 4. Split PDF into Individual Pages

def split_pdf(input_file):
    reader = PdfReader(input_file)
    for i, page in enumerate(reader.pages):
        writer = PdfWriter()
        writer.add_page(page)
        output_filename = f"page_{i+1}.pdf"
        with open(output_filename, "wb") as f:
            writer.write(f)
    print(f"βœ… Split '{input_file}' into {len(reader.pages)} pages.")

πŸ§ͺ Example:

split_pdf("merged_report.pdf")

πŸ” 5. Rotate Pages

def rotate_pdf(input_file, output_file, rotation_angle=90):
    reader = PdfReader(input_file)
    writer = PdfWriter()

    for page in reader.pages:
        page.rotate(rotation_angle)
        writer.add_page(page)

    with open(output_file, "wb") as f:
        writer.write(f)
    print(f"βœ… Rotated all pages in '{input_file}' by {rotation_angle}Β°")

πŸ§ͺ Example:

rotate_pdf("page_1.pdf", "rotated_page.pdf", 180)

πŸ’§ 6. Add Watermark to Each Page

def add_watermark(input_file, watermark_file, output_file):
    reader = PdfReader(input_file)
    writer = PdfWriter()
    watermark = PdfReader(watermark_file).pages[0]

    for page in reader.pages:
        page.merge_page(watermark)
        writer.add_page(page)

    with open(output_file, "wb") as f:
        writer.write(f)
    print(f"βœ… Added watermark to '{input_file}' and saved as '{output_file}'")

πŸ§ͺ Example:

add_watermark("merged_report.pdf", "watermark.pdf", "watermarked_output.pdf")

πŸ“œ 7. Extract Text from PDF

def extract_text(input_file):
    reader = PdfReader(input_file)
    all_text = ""
    for page in reader.pages:
        all_text += page.extract_text() + "\n"
    with open("extracted_text.txt", "w", encoding="utf-8") as f:
        f.write(all_text)
    print("βœ… Text extracted and saved as 'extracted_text.txt'")

πŸ§ͺ Example:

extract_text("report.pdf")

🧰 8. Interactive Menu System

def main():
    print("=== PDF Toolkit ===")
    print("1. Merge PDFs")
    print("2. Split PDF")
    print("3. Rotate PDF")
    print("4. Add Watermark")
    print("5. Extract Text")
    print("6. Exit")

    choice = input("Enter your choice: ")

    if choice == "1":
        files = input("Enter PDF filenames (comma separated): ").split(",")
        output = input("Output file name: ")
        merge_pdfs([f.strip() for f in files], output)
    elif choice == "2":
        file = input("Enter PDF filename to split: ")
        split_pdf(file)
    elif choice == "3":
        file = input("Enter PDF filename: ")
        angle = int(input("Rotation angle (90/180/270): "))
        output = input("Output file name: ")
        rotate_pdf(file, output, angle)
    elif choice == "4":
        file = input("Enter PDF filename: ")
        watermark = input("Enter watermark PDF filename: ")
        output = input("Output file name: ")
        add_watermark(file, watermark, output)
    elif choice == "5":
        file = input("Enter PDF filename: ")
        extract_text(file)
    elif choice == "6":
        print("Goodbye πŸ‘‹")
    else:
        print("Invalid choice.")
if __name__ == "__main__":
    main()

πŸ’‘ 9. Enhancement Ideas

FeatureDescription
πŸ–Ό GUI ToolkitAdd a Tkinter-based file selector
πŸ” PDF SecurityAdd password protection or encryption
πŸ“‘ MetadataDisplay or edit PDF metadata (title, author)
πŸ“ Batch ModeProcess entire folders automatically
☁️ Cloud UploadSave output directly to Google Drive or Dropbox

βœ… Summary

FeatureFunction
πŸ“Ž MergeCombine multiple PDFs
βœ‚οΈ SplitSeparate pages into files
πŸ” RotateRotate PDF pages
πŸ’§ WatermarkAdd watermark/logo
πŸ“œ ExtractExtract text from PDFs
🧰 ExtendGUI or encryption features possible


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *