Fast, local semantic search over web content for AI agents. Sifts the net for signal — uses ~90% fewer tokens than raw web_fetch.
When an AI agent researches the web, the usual flow is: search → fetch 10 pages → drown in 100k+ tokens of irrelevant prose. nesift sits between the web and the agent: it ingests pages on the fly, indexes them with hybrid BM25 + embeddings, deduplicates redundant content across sources, and returns only the chunks that fit your token budget.
- Local — runs on CPU, no API keys, no cloud calls (other than the page fetch itself).
- Zero setup —
pip install -e ., no database, no daemon. - Session-scoped — index lives in
/tmpand is per-session by default. - Hybrid retrieval — BM25 +
potion-retrieval-32Membeddings fused via RRF. - Context budget mode —
--budget Ntrims results to N tokens. - Cross-page dedup — collapses near-identical chunks, notes source count.
- SearXNG bridge —
nesift search "..."does search + filter + fetch + index + answer in one command.
git clone git@github.com:scottgl9/nesift.git
cd nesift
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"Requires Python 3.11+.
# Index a page and ask about it
nesift add https://en.wikipedia.org/wiki/Retrieval-augmented_generation
nesift query "what is RAG used for" --budget 1500
nesift answer "how does RAG reduce hallucinations"
# Pre-fetch scoring — rank snippets before downloading
nesift score "vector database" "Pinecone is a vector DB" "How to bake bread"
# One-shot SearXNG search + ingest + answer
NESIFT_SEARXNG_URL=http://127.0.0.1:8888 \
nesift search "retry logic in distributed systems" --top 5 --budget 2000
nesift list
nesift clearSee docs/cli.md for every command and flag.
URL → trafilatura extract → heading-aware chunker → triage summary
→ BM25 index + potion-retrieval-32M embeddings (CPU)
→ query: RRF fusion + dedup + budget trim → ranked chunks or synthesized answer
See docs/architecture.md.
pip install "nesift[mcp]"
nesift-mcp # stdio MCP serverTools exposed: score_snippets, add_page, add_batch, query, answer, list_pages, clear, search. See docs/mcp.md.
nesift add https://arxiv.org/pdf/2005.11401.pdfContent type is auto-detected; .pdf URLs (or any response with the PDF signature) route through pypdf.
nesift add https://es.wikipedia.org/wiki/... --lang--lang swaps in potion-multilingual-128M (101 languages).
GPL-2.0-only — see LICENSE.