A self-contained B2B lead generation pipeline that runs off-laptop on GitHub Actions. Scrapes recently-launched startups from YC, Hacker News, Betalist, and Product Hunt; finds founder emails via Icypeas; verifies with a second-pass; recovers missed leads via headless site-scrape.
The pipeline outputs a single CSV of safe-only verified emails ready for
manual review and upload to your sending tool of choice (e.g. Smartlead).
GitHub → Actions → Lead pipeline → Run workflow.
Inputs:
days— window for YC / Hacker News / Betalist. Use2-3for a quick test,60for a full batch.ph_days— window for Product Hunt. Kept short (default14) — PH does a slow full chronological crawl, so a 60-day window doesn't fit the job budget.
When it finishes, download the leads artifact. The file you want is
data/enriched/safe_leads_final.csv.
| Step | Script | Does |
|---|---|---|
| scrape | scraper/main.py |
YC + HN + Betalist + Product Hunt launches |
| 00 | pipeline/00_fetch_contacted.py |
Exclusion set from prior campaigns |
| 01 / 01b | 01_consolidate.py / 01b_enrich_betalist.py |
Dedup + founder names |
| 03i | 03i_icypeas_search.py |
Icypeas email-search (1 credit per FOUND only) |
| 04b | 04b_emit_missed.py |
List domains where 03i didn't find an email |
| 06 | 06_scrape_emails.py |
Static (or headless) site-scrape: mailto: + email regex |
| 07i | 07i_icypeas_recovery.py |
Verify scraped contacts via Icypeas, merge |
| 04d | 04d_merge_recovery_into_lwe.py |
Unify recovery rows into the main file |
| 02b | 02b_scrape_homepage.py |
Optional: hero text for downstream personalisation |
- Send only to verified
saferesults. Nocatch_all, nounknown. - The pipeline produces a DRAFT upload only — final review + send is manual.
SMARTLEAD_API_KEY, REOON_API_KEY, ICYPEAS_API_KEY, PH_CLIENT_ID,
PH_CLIENT_SECRET. Local runs read the same keys from .env (gitignored —
see .env.example).