Focus Areas: NLP, Machine Learning, Automation
Difficulty: Intermediate–Advanced
Real-World Use: HR teams receive hundreds of resumes. An AI Resume Screener automatically parses resumes, extracts skills, ranks candidates, and generates a shortlist.
🌟 What You Will Build
A Python program that:
- Reads multiple resumes (PDF/DOCX/TXT)
- Extracts candidate information
- Extracts skills using NLP
- Matches resume skills with job description skills
- Calculates a match score (0–100%)
- Sorts and displays best candidates
- Exports results to CSV
🧠 Tech Stack
python-docx(for DOCX reading)PyPDF2(for PDF reading)spaCy(NLP for skill extraction)pandasre(regex for cleaning)os
📁 Folder Structure
AI_Resume_Screener/
│── resumes/
│ ├── resume1.pdf
│ ├── resume2.docx
│── job_description.txt
│── screener.py
│── skills_library.txt
📘 skills_library.txt (example)
These skills will be matched:
python
machine learning
excel
power bi
communication
sql
customer service
java
react
salesforce
data analysis
You can add more.
📄 job_description.txt (example)
We are hiring a Senior Customer Service Representative.
Strong communication, client handling, CRM knowledge, and problem-solving required.
Experience with Excel and email support preferred.
🧠 Core Python Program: screener.py
import os
import re
import PyPDF2
import docx
import spacy
import pandas as pd
# Load NLP model
nlp = spacy.load("en_core_web_sm")
# Load skills library
with open("skills_library.txt", "r") as f:
SKILL_LIST = [skill.strip().lower() for skill in f.readlines()]
# Clean text
def clean_text(text):
text = text.lower()
text = re.sub(r'[^a-zA-Z0-9\s]', ' ', text)
text = re.sub(r'\s+', ' ', text)
return text
# Extract text from PDF
def read_pdf(path):
text = ""
with open(path, "rb") as pdf:
reader = PyPDF2.PdfReader(pdf)
for page in reader.pages:
text += page.extract_text() + " "
return text
# Extract text from DOCX
def read_docx(path):
doc = docx.Document(path)
return " ".join([para.text for para in doc.paragraphs])
# Read any resume
def extract_text(path):
if path.endswith(".pdf"):
return read_pdf(path)
elif path.endswith(".docx"):
return read_docx(path)
elif path.endswith(".txt"):
return open(path, "r").read()
return ""
# Extract skills from resume
def extract_skills(text):
words = set(clean_text(text).split())
found = [skill for skill in SKILL_LIST if skill in words]
return found
# Calculate match score
def calculate_match(resume_skills, jd_skills):
score = (len(set(resume_skills) & set(jd_skills)) / len(jd_skills)) * 100
return round(score, 2)
# Load job description skills using NLP
def get_jd_skills():
with open("job_description.txt", "r") as f:
jd_text = clean_text(f.read())
jd_words = jd_text.split()
return [skill for skill in SKILL_LIST if skill in jd_words]
# Main screening function
def screen_resumes():
jd_skills = get_jd_skills()
results = []
for file in os.listdir("resumes"):
path = os.path.join("resumes", file)
text = extract_text(path)
skills = extract_skills(text)
score = calculate_match(skills, jd_skills)
results.append({
"Resume Name": file,
"Skills Found": ", ".join(skills),
"Match Score": score
})
df = pd.DataFrame(results)
df = df.sort_values(by="Match Score", ascending=False)
df.to_csv("screening_results.csv", index=False)
print(df)
print("\nResults saved to screening_results.csv")
screen_resumes()
🎯 Output Example
| Resume | Skills Found | Match Score |
|---|---|---|
| resume3.pdf | communication, excel, crm | 85 |
| resume1.docx | customer service, excel | 70 |
| resume2.pdf | python | 10 |
🏆 Real-World Improvements
Add these once basic version works:
✔ Resume Ranking Model
Use TF-IDF + cosine similarity.
✔ Named Entity Recognition
Extract:
- Name
- Phone
- Experience years
- Education
✔ Streamlit Web App
Upload resume → Get score instantly.
✔ HR Dashboard
Graphs for:
- skill distribution
- top candidates
- missing skill analysis
✔ Multi-Job Screening
Screen resumes for 50 different roles.

Leave a Reply