{"id":241,"date":"2025-11-20T14:43:16","date_gmt":"2025-11-20T14:43:16","guid":{"rendered":"https:\/\/codetypingpro.com\/?p=241"},"modified":"2025-11-20T14:43:16","modified_gmt":"2025-11-20T14:43:16","slug":"34-real-world-python-projects-ai-text-summarizer","status":"publish","type":"post","link":"https:\/\/codetypingpro.com\/?p=241","title":{"rendered":"34 &#8211; Real-World Python Projects &#8211; AI Text Summarizer"},"content":{"rendered":"\n<h6 class=\"wp-block-heading\"><\/h6>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Purpose<\/strong><\/h3>\n\n\n\n<p>Automatically summarize long documents, news articles, research papers, emails, or reports into short, meaningful summaries using NLP and transformer models.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Used In Real Life By<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Content creators<\/li>\n\n\n\n<li>Students &amp; researchers<\/li>\n\n\n\n<li>News agencies<\/li>\n\n\n\n<li>HR (summarizing resumes &amp; job descriptions)<\/li>\n\n\n\n<li>Corporate teams (summarizing long reports)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\ud83e\udde0 <strong>What This Project Will Do<\/strong><\/h1>\n\n\n\n<p>\u2714 Accept text from:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>PDF<\/li>\n\n\n\n<li>DOCX<\/li>\n\n\n\n<li>URL<\/li>\n\n\n\n<li>Plain text<\/li>\n<\/ul>\n\n\n\n<p>\u2714 Clean and preprocess the content<br>\u2714 Summarize using AI models<br>\u2714 Output multiple styles of summaries:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Short summary<\/strong><\/li>\n\n\n\n<li><strong>Detailed summary<\/strong><\/li>\n\n\n\n<li><strong>Bullet-point summary<\/strong><\/li>\n\n\n\n<li><strong>Title generation<\/strong><\/li>\n<\/ul>\n\n\n\n<p>\u2714 Save results to a <strong>text file<\/strong> or <strong>JSON<\/strong><br>\u2714 Optional GUI or REST API<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\ud83e\uddf0 <strong>Tech Stack<\/strong><\/h1>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>transformers<\/code> (HuggingFace models)<\/li>\n\n\n\n<li><code>PyPDF2<\/code> (PDF)<\/li>\n\n\n\n<li><code>python-docx<\/code> (DOCX)<\/li>\n\n\n\n<li><code>BeautifulSoup<\/code> + <code>requests<\/code> (web pages)<\/li>\n\n\n\n<li><code>pandas \/ json<\/code> (output)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\ud83d\udcc1 Folder Structure<\/h1>\n\n\n\n<pre class=\"wp-block-code\"><code>AI_Text_Summarizer\/\n\u2502\u2500\u2500 summarizer.py\n\u2502\u2500\u2500 input\/\n\u2502     \u251c\u2500\u2500 sample.pdf\n\u2502     \u251c\u2500\u2500 sample.docx\n\u2502\u2500\u2500 output\/\n\u2502\n\u2514\u2500\u2500 models\/   (optional)\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\ud83d\udd25 <strong>HuggingFace Summarization Model<\/strong><\/h1>\n\n\n\n<p>We will use:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>facebook\/bart-large-cnn\n<\/code><\/pre>\n\n\n\n<p>Best quality &amp; fast.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\ud83e\udde0 <strong>Full Python Code: <code>summarizer.py<\/code><\/strong><\/h1>\n\n\n\n<pre class=\"wp-block-code\"><code>import PyPDF2\nimport docx\nimport requests\nfrom bs4 import BeautifulSoup\nfrom transformers import pipeline\n\n# Load AI summarization model\nsummarizer = pipeline(\"summarization\", model=\"facebook\/bart-large-cnn\")\n\n# --------- FILE READERS ---------\n\ndef read_pdf(path):\n    text = \"\"\n    with open(path, \"rb\") as f:\n        reader = PyPDF2.PdfReader(f)\n        for page in reader.pages:\n            page_text = page.extract_text()\n            if page_text:\n                text += page_text + \" \"\n    return text\n\ndef read_docx(path):\n    doc = docx.Document(path)\n    return \" \".join(&#91;p.text for p in doc.paragraphs])\n\ndef read_url(url):\n    html = requests.get(url).text\n    soup = BeautifulSoup(html, \"html.parser\")\n    return soup.get_text(separator=\" \")\n\n# --------- AI SUMMARIZER ---------\n\ndef make_summary(text):\n    # HuggingFace models accept max ~1024 tokens, so summarize in chunks\n    chunk_size = 1000\n    chunks = &#91;text&#91;i:i+chunk_size] for i in range(0, len(text), chunk_size)]\n\n    outputs = &#91;]\n    for chunk in chunks:\n        summary = summarizer(chunk, max_length=150, min_length=60, do_sample=False)\n        outputs.append(summary&#91;0]&#91;\"summary_text\"])\n\n    final_summary = \" \".join(outputs)\n    return final_summary\n\n# --------- MAIN ---------\n\ndef summarize_any(path_or_url):\n    if path_or_url.startswith(\"http\"):\n        text = read_url(path_or_url)\n    elif path_or_url.endswith(\".pdf\"):\n        text = read_pdf(path_or_url)\n    elif path_or_url.endswith(\".docx\"):\n        text = read_docx(path_or_url)\n    else:\n        text = open(path_or_url, \"r\").read()\n\n    text = text.strip().replace(\"\\n\", \" \")\n\n    summary = make_summary(text)\n\n    # Bullet points\n    bullets = \"\\n\".join(&#91;f\"\u2022 {line.strip()}\" for line in summary.split(\".\") if line.strip()])\n\n    # Title generation\n    title = summarizer(summary, max_length=20, min_length=5, do_sample=False)&#91;0]&#91;\"summary_text\"]\n\n    return {\n        \"title\": title,\n        \"summary\": summary,\n        \"bullet_points\": bullets\n    }\n\n\n# --------- RUN EXAMPLE ---------\n\nif __name__ == \"__main__\":\n    result = summarize_any(\"input\/sample.pdf\")\n\n    with open(\"output\/summary.txt\", \"w\") as f:\n        f.write(\"TITLE:\\n\")\n        f.write(result&#91;\"title\"] + \"\\n\\n\")\n        f.write(\"SUMMARY:\\n\")\n        f.write(result&#91;\"summary\"] + \"\\n\\n\")\n        f.write(\"BULLET POINTS:\\n\")\n        f.write(result&#91;\"bullet_points\"])\n\n    print(\"Summary saved to output\/summary.txt\")\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\ud83d\udccc <strong>Example Outputs<\/strong><\/h1>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Title Generated:<\/strong><\/h3>\n\n\n\n<p><strong>&#8220;Impact of AI on Modern Businesses&#8221;<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Short Summary:<\/strong><\/h3>\n\n\n\n<p>AI technologies significantly improve business efficiency by automating repetitive tasks, optimizing decision-making, and enhancing customer experience\u2026<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Bullet-Point Summary:<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>\u2022 AI increases operational efficiency  \n\u2022 Automates repetitive tasks  \n\u2022 Enhances customer experience  \n\u2022 Enables better data-driven decisions  \n\u2022 Popular in finance, healthcare &amp; retail  \n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\ud83d\ude80 <strong>Advanced Enhancements<\/strong><\/h1>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udd39 1. Add GUI with Tkinter \/ PyQt<\/h3>\n\n\n\n<p>Drop file \u2192 Get summary instantly.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udd39 2. Make a REST API<\/h3>\n\n\n\n<p>Use FastAPI \u2192 <code>\/summarize<\/code> endpoint.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udd39 3. Chrome Extension<\/h3>\n\n\n\n<p>Right-click \u2192 \u201cSummarize this page\u201d.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udd39 4. Multi-language summarization<\/h3>\n\n\n\n<p>Add multilingual models (MBART).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udd39 5. PDF export of summary<\/h3>\n\n\n\n<p>Integrate with ReportLab.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Purpose Automatically summarize long documents, news articles, research papers, emails, or reports into short, meaningful summaries using NLP and transformer models. Used In Real Life By \ud83e\udde0 What This Project Will Do \u2714 Accept text from: \u2714 Clean and preprocess the content\u2714 Summarize using AI models\u2714 Output multiple styles of summaries: \u2714 Save results to [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-241","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/posts\/241","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=241"}],"version-history":[{"count":1,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/posts\/241\/revisions"}],"predecessor-version":[{"id":242,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=\/wp\/v2\/posts\/241\/revisions\/242"}],"wp:attachment":[{"href":"https:\/\/codetypingpro.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=241"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=241"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/codetypingpro.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=241"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}