PhD Advisor Application Assistant - Automate CS professor information collection, intelligent matching analysis, and cold email generation
π Full documentation: https://nameistzzhang.github.io/phd_hunter/
- π CSRankings Crawler - Automatically fetch CS professor rankings and lists
- π OpenAlex Paper Fetching - Fetch papers via institution + author matching (primary source)
- π arXiv Abstract Enrichment - Supplement OpenAlex abstracts with accurate arXiv data
- π Homepage Scraping - Scrape professor homepages and generate AI summaries
- πΎ SQLite Storage - All data persisted locally
- π€ Professor Matching Scoring - LLM-based direction match (1-5) and admission difficulty (1-5)
- π¬ Intelligent Chat Analysis - One-click professor analysis report + cold email draft
- π― Personalized Cold Emails - Customized emails based on your Profile (CV/PS/papers)
- π Modern SPA Interface - Flask-based interactive single-page application
- π·οΈ Priority Management - Reach / Match / Target / Safety / Not Considered
- π Multi-dimensional Filtering - By priority, research area, university, score
- π€ Profile Management - Upload CV/PS, manage arXiv papers, set research preferences
- βοΈ LLM Configuration - Configure API Key, model, temperature, iterations
- Python 3.10+
- uv (recommended) or pip
- Chrome/Chromium browser (for Selenium homepage scraping)
# 1. Clone the repository
git clone <repository-url>
cd phd-hunter
# 2. Install dependencies
# Using uv (recommended):
uv sync
# Or using pip:
pip install -e .
# Or using uv pip:
uv pip install -e .
# 3. Install api_infra (REQUIRED for LLM features)
cd src/phd_hunter/api_infra
pip install -e .
cd ../../..You must create the config files before running the application.
# 1. Configure LLM parameters (REQUIRED for AI features)
cp src/phd_hunter/frontend/hound_config.example.json src/phd_hunter/frontend/hound_config.json
# Edit hound_config.json and fill in your API key and model settings
# 2. Configure crawl parameters (optional)
cp src/phd_hunter/frontend/hunt_config.example.json src/phd_hunter/frontend/hunt_config.jsonhound_config.json example:
{
"api_key": "your-api-key-here",
"model": "deepseek-v3.2",
"provider": "yunwu",
"url": "https://yunwu.ai/v1",
"temperature": 0.6,
"max_tokens": 800,
"scoring_iterations": 3,
"nickname": "YourName"
}Note: Without
hound_config.json, the Analyzer (chat), Scorer (matching score), and Homepage Crawler will not work. You can still browse professor data and manage priorities without it.
# 1. Crawl professor data
python main.py crawl --area ai --region world --max-professors 5
# 2. Fetch papers
python main.py fetch-papers --max-papers 10
# 3. Scrape professor homepages (requires LLM config)
python -m phd_hunter.crawlers.homepage_crawler
# 4. Run matching score (requires LLM config)
python -m phd_hunter.hound.scorer
# 5. View statistics
python main.py stats# Start Flask Web Server (default http://localhost:8080)
# Linux / macOS:
PYTHONPATH=src python -m phd_hunter.frontend.app
# Windows (Command Prompt):
set PYTHONPATH=src && python -m phd_hunter.frontend.app
# Windows (PowerShell):
$env:PYTHONPATH="src"; python -m phd_hunter.frontend.appThen open http://localhost:8080 in your browser:
- Hunt page: Browse professor cards, filter, sort, mark priorities
- Chat page: Click a professor to start AI conversation with auto-generated analysis and cold email draft
- Profile page: Upload CV/PS, add arXiv papers, set research preferences
phd_hunter/
βββ main.py # CLI entry
βββ pyproject.toml # Project config
βββ README.md # This file
βββ docs/ # Sphinx documentation
βββ tests/ # Test files
βββ src/phd_hunter/
βββ __init__.py # Package init
βββ models.py # Pydantic data models
βββ database.py # SQLite database operations
βββ api_infra/ # LLM API infrastructure
β βββ __init__.py
β βββ core/
β βββ client.py # Unified LLM client
βββ crawlers/
β βββ __init__.py # Export crawlers
β βββ base.py # Crawler base class (with caching)
β βββ csrankings.py # CSRankings crawler (Selenium)
β βββ openalex_crawler.py # OpenAlex crawler (primary paper source)
β βββ arxiv_crawler.py # arXiv crawler (abstract enrichment + manual add)
β βββ homepage_crawler.py # Homepage scraper + AI summary
βββ hound/
β βββ __init__.py
β βββ scorer.py # Professor matching scorer
β βββ scorer_daemon.py # Background auto-scoring daemon
βββ analyzer/
β βββ __init__.py # Export analyze_professor, chat_with_professor
β βββ analyzer.py # Professor analysis + cold email core
β βββ prompts.py # Analyzer prompt templates
βββ utils/
β βββ logger.py # Logging config
β βββ helpers.py # Utility functions
β βββ pdf_extract.py # PDF text extraction + Profile builder
βββ frontend/ # Web frontend
βββ app.py # Flask API server
βββ index.html # Main page
βββ hound_config.json # LLM config (create from example!)
βββ hunt_config.json # Crawl config (create from example!)
βββ static/
β βββ styles.css # Stylesheet
β βββ app.js # Frontend logic
β βββ windsurf.svg # AI avatar icon
βββ templates/ # HTML templates
SQLite database with core tables:
- Basic info: name, university, rank, department, email, homepage
- Research interests, priority (-1~3)
- AI analysis: homepage_summary, direction_match_score, admission_difficulty_score
- Chat history: messages (JSON)
- Paper metadata (title, authors, abstract, year, venue)
- arXiv ID, PDF link, citation count
- Linked to professor record
- User Profile: CV text, PS text
- Research preferences, arXiv paper list
Based on your Profile and professor data, auto-generates:
- Professor research direction analysis
- Matching points between you and the professor
- Cold email writing guidelines
- Complete cold email draft
Supports multi-round conversation to refine the draft.
Uses LLM to score each professor:
- Direction Match (1-5): Research direction matching degree
- Admission Difficulty (1-5): Admission difficulty assessment
Uses Selenium to scrape professor homepages, then LLM extracts:
- Research focus
- Recruiting status
- Homepage content summary
Click the βοΈ settings icon in the top-right corner to configure:
- API Key
- Provider / Model
- URL (custom API endpoint)
- Temperature / Max Tokens
- Scoring Iterations
Go to the Profile page:
- Upload CV and PS (PDF format)
- Add interesting arXiv paper links
- Set research preferences
The Hunt page displays all professor cards:
- Top bar shows statistics: universities, professors, papers, avg scores
- Use filter bar to filter by priority / area / university / score
- Click professor card to view details (papers link to arXiv)
Professor Detail Modal:
- Rescore β Re-run LLM scoring after editing papers
- Add Paper β Paste an arXiv URL to manually add a paper
- Delete Paper β Remove incorrect papers with the Γ button
Click Chat to enter the conversation:
- First entry auto-analyzes professor and generates cold email draft
- Continue the conversation to modify or ask questions
- Each message can be individually deleted
python main.py crawl --area ai --region world --max-professors 5Parameters:
--area: Research area (default:ai)--region: Region filter (default:world)--max-universities: Max university count (default: all)--max-professors: Max professors per university (default: 5)--no-headless: Show browser window--timeout: Page timeout (seconds, default: 30)
python main.py fetch-papers --max-papers 10 --max-professors 50Parameters:
--max-papers: Max papers per professor (default: 10)--max-professors: Max professors to process (default: all)--delay: Request interval (seconds, default: 1.0)
python main.py stats- arXiv vs Non-arXiv Papers: OpenAlex covers all venues, but only papers with an arXiv association get enriched with full abstracts and PDF links. Pure conference/journal papers may have limited metadata.
- OpenAlex Institution Matching: Author identification relies on OpenAlex's institution linking. Professors with ambiguous names or recent institution changes may occasionally be misidentified.
- LLM Cost: Analyzer, Scorer, and Homepage Paper Extraction all require LLM API calls. Watch your budget.
- Homepage Scraping: Some professor homepages have anti-bot mechanisms and may fail. Homepage extraction is best-effort; missing data does not block other features.
- π Online Docs: https://nameistzzhang.github.io/phd_hunter/
- π Local Docs: See
docs/directory
Build docs locally:
cd docs && make htmluv run pytest tests/ -vuv run black --check src/
uv run ruff check src/MIT License - see LICENSE file
- CSRankings - Professor data source
- OpenAlex - Primary paper and author data source
- arXiv - Paper abstract enrichment and manual addition
- Semantic Scholar - Supplementary paper data
β Star this repo if it helps you!