34 – Real-World Python Projects – AI Text Summarizer

Purpose

Automatically summarize long documents, news articles, research papers, emails, or reports into short, meaningful summaries using NLP and transformer models.

Used In Real Life By

  • Content creators
  • Students & researchers
  • News agencies
  • HR (summarizing resumes & job descriptions)
  • Corporate teams (summarizing long reports)

🧠 What This Project Will Do

βœ” Accept text from:

  • PDF
  • DOCX
  • URL
  • Plain text

βœ” Clean and preprocess the content
βœ” Summarize using AI models
βœ” Output multiple styles of summaries:

  • Short summary
  • Detailed summary
  • Bullet-point summary
  • Title generation

βœ” Save results to a text file or JSON
βœ” Optional GUI or REST API


🧰 Tech Stack

  • transformers (HuggingFace models)
  • PyPDF2 (PDF)
  • python-docx (DOCX)
  • BeautifulSoup + requests (web pages)
  • pandas / json (output)

πŸ“ Folder Structure

AI_Text_Summarizer/
│── summarizer.py
│── input/
β”‚     β”œβ”€β”€ sample.pdf
β”‚     β”œβ”€β”€ sample.docx
│── output/
β”‚
└── models/   (optional)

πŸ”₯ HuggingFace Summarization Model

We will use:

facebook/bart-large-cnn

Best quality & fast.


🧠 Full Python Code: summarizer.py

import PyPDF2
import docx
import requests
from bs4 import BeautifulSoup
from transformers import pipeline

# Load AI summarization model
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# --------- FILE READERS ---------

def read_pdf(path):
    text = ""
    with open(path, "rb") as f:
        reader = PyPDF2.PdfReader(f)
        for page in reader.pages:
            page_text = page.extract_text()
            if page_text:
                text += page_text + " "
    return text

def read_docx(path):
    doc = docx.Document(path)
    return " ".join([p.text for p in doc.paragraphs])

def read_url(url):
    html = requests.get(url).text
    soup = BeautifulSoup(html, "html.parser")
    return soup.get_text(separator=" ")

# --------- AI SUMMARIZER ---------

def make_summary(text):
    # HuggingFace models accept max ~1024 tokens, so summarize in chunks
    chunk_size = 1000
    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]

    outputs = []
    for chunk in chunks:
        summary = summarizer(chunk, max_length=150, min_length=60, do_sample=False)
        outputs.append(summary[0]["summary_text"])

    final_summary = " ".join(outputs)
    return final_summary

# --------- MAIN ---------

def summarize_any(path_or_url):
    if path_or_url.startswith("http"):
        text = read_url(path_or_url)
    elif path_or_url.endswith(".pdf"):
        text = read_pdf(path_or_url)
    elif path_or_url.endswith(".docx"):
        text = read_docx(path_or_url)
    else:
        text = open(path_or_url, "r").read()

    text = text.strip().replace("\n", " ")

    summary = make_summary(text)

    # Bullet points
    bullets = "\n".join([f"β€’ {line.strip()}" for line in summary.split(".") if line.strip()])

    # Title generation
    title = summarizer(summary, max_length=20, min_length=5, do_sample=False)[0]["summary_text"]

    return {
        "title": title,
        "summary": summary,
        "bullet_points": bullets
    }


# --------- RUN EXAMPLE ---------

if __name__ == "__main__":
    result = summarize_any("input/sample.pdf")

    with open("output/summary.txt", "w") as f:
        f.write("TITLE:\n")
        f.write(result["title"] + "\n\n")
        f.write("SUMMARY:\n")
        f.write(result["summary"] + "\n\n")
        f.write("BULLET POINTS:\n")
        f.write(result["bullet_points"])

    print("Summary saved to output/summary.txt")

πŸ“Œ Example Outputs

Title Generated:

“Impact of AI on Modern Businesses”

Short Summary:

AI technologies significantly improve business efficiency by automating repetitive tasks, optimizing decision-making, and enhancing customer experience…

Bullet-Point Summary:

β€’ AI increases operational efficiency  
β€’ Automates repetitive tasks  
β€’ Enhances customer experience  
β€’ Enables better data-driven decisions  
β€’ Popular in finance, healthcare & retail  

πŸš€ Advanced Enhancements

πŸ”Ή 1. Add GUI with Tkinter / PyQt

Drop file β†’ Get summary instantly.

πŸ”Ή 2. Make a REST API

Use FastAPI β†’ /summarize endpoint.

πŸ”Ή 3. Chrome Extension

Right-click β†’ β€œSummarize this page”.

πŸ”Ή 4. Multi-language summarization

Add multilingual models (MBART).

πŸ”Ή 5. PDF export of summary

Integrate with ReportLab.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *