Purpose
Automatically summarize long documents, news articles, research papers, emails, or reports into short, meaningful summaries using NLP and transformer models.
Used In Real Life By
- Content creators
- Students & researchers
- News agencies
- HR (summarizing resumes & job descriptions)
- Corporate teams (summarizing long reports)
π§ What This Project Will Do
β Accept text from:
- DOCX
- URL
- Plain text
β Clean and preprocess the content
β Summarize using AI models
β Output multiple styles of summaries:
- Short summary
- Detailed summary
- Bullet-point summary
- Title generation
β Save results to a text file or JSON
β Optional GUI or REST API
π§° Tech Stack
transformers(HuggingFace models)PyPDF2(PDF)python-docx(DOCX)BeautifulSoup+requests(web pages)pandas / json(output)
π Folder Structure
AI_Text_Summarizer/
βββ summarizer.py
βββ input/
β βββ sample.pdf
β βββ sample.docx
βββ output/
β
βββ models/ (optional)
π₯ HuggingFace Summarization Model
We will use:
facebook/bart-large-cnn
Best quality & fast.
π§ Full Python Code: summarizer.py
import PyPDF2
import docx
import requests
from bs4 import BeautifulSoup
from transformers import pipeline
# Load AI summarization model
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
# --------- FILE READERS ---------
def read_pdf(path):
text = ""
with open(path, "rb") as f:
reader = PyPDF2.PdfReader(f)
for page in reader.pages:
page_text = page.extract_text()
if page_text:
text += page_text + " "
return text
def read_docx(path):
doc = docx.Document(path)
return " ".join([p.text for p in doc.paragraphs])
def read_url(url):
html = requests.get(url).text
soup = BeautifulSoup(html, "html.parser")
return soup.get_text(separator=" ")
# --------- AI SUMMARIZER ---------
def make_summary(text):
# HuggingFace models accept max ~1024 tokens, so summarize in chunks
chunk_size = 1000
chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
outputs = []
for chunk in chunks:
summary = summarizer(chunk, max_length=150, min_length=60, do_sample=False)
outputs.append(summary[0]["summary_text"])
final_summary = " ".join(outputs)
return final_summary
# --------- MAIN ---------
def summarize_any(path_or_url):
if path_or_url.startswith("http"):
text = read_url(path_or_url)
elif path_or_url.endswith(".pdf"):
text = read_pdf(path_or_url)
elif path_or_url.endswith(".docx"):
text = read_docx(path_or_url)
else:
text = open(path_or_url, "r").read()
text = text.strip().replace("\n", " ")
summary = make_summary(text)
# Bullet points
bullets = "\n".join([f"β’ {line.strip()}" for line in summary.split(".") if line.strip()])
# Title generation
title = summarizer(summary, max_length=20, min_length=5, do_sample=False)[0]["summary_text"]
return {
"title": title,
"summary": summary,
"bullet_points": bullets
}
# --------- RUN EXAMPLE ---------
if __name__ == "__main__":
result = summarize_any("input/sample.pdf")
with open("output/summary.txt", "w") as f:
f.write("TITLE:\n")
f.write(result["title"] + "\n\n")
f.write("SUMMARY:\n")
f.write(result["summary"] + "\n\n")
f.write("BULLET POINTS:\n")
f.write(result["bullet_points"])
print("Summary saved to output/summary.txt")
π Example Outputs
Title Generated:
“Impact of AI on Modern Businesses”
Short Summary:
AI technologies significantly improve business efficiency by automating repetitive tasks, optimizing decision-making, and enhancing customer experienceβ¦
Bullet-Point Summary:
β’ AI increases operational efficiency
β’ Automates repetitive tasks
β’ Enhances customer experience
β’ Enables better data-driven decisions
β’ Popular in finance, healthcare & retail
π Advanced Enhancements
πΉ 1. Add GUI with Tkinter / PyQt
Drop file β Get summary instantly.
πΉ 2. Make a REST API
Use FastAPI β /summarize endpoint.
πΉ 3. Chrome Extension
Right-click β βSummarize this pageβ.
πΉ 4. Multi-language summarization
Add multilingual models (MBART).
πΉ 5. PDF export of summary
Integrate with ReportLab.

Leave a Reply