Legacy Code Analyzer

A compact, production-minded MVP that analyzes legacy code with a "graph-first, AI-last" strategy.

This repository combines lightweight static analysis (dependency graph, regex/tree-sitter chunking), a persistent knowledge graph memory, a lexical prefilter, Redis caching and an AI pipeline that runs targeted per-chunk analyses (docstrings, security/bug detection, and a junior-friendly summary).

Features

Graph-first question answering: many questions are answered from the dependency graph at zero AI cost.
Targeted per-chunk AI analysis (docstring, bugs & vulnerabilities, junior summary).
Prefilter (BM25-like + risk rules) to reduce AI calls and cost.
Persistent knowledge graph memory to boost relevant chunks and improve future results.
Redis caching with deterministic SHA-256 keys to reuse results across requests.
Multi-provider LLM support: openai, anthropic (Claude), google (Gemini/Vertex).
Security headers and a simple rate limiter for the API surface.

Quick Architecture Overview

FastAPI backend entry: backend/app/main.py — initializes singletons: CacheService, KnowledgeGraphService, DependencyGraphService, AnalysisPipeline.
Chunking: backend/app/services/chunker.py — tree-sitter (preferred) or regex fallback; splits very large chunks into parts.
Dependency graph (session/global): backend/app/services/dependency_graph.py — imports, calls, inheritance extraction + BFS impact analysis (zero-cost answers).
Knowledge graph (persistent memory): backend/app/services/knowledge_graph.py — stores node tokens, issue patterns and supplies memory contexts + chunk boosts.
Prefilter: backend/app/services/prefilter.py — lexical scoring + risk boosts to select top-k candidates for AI.
Cache (Redis): backend/app/services/cache.py — MGET/MSET-style helpers and TTL.
AI Pipeline: backend/app/services/pipeline.py — builds a parallel runnable of 3 tasks (docstring, bugs, junior summary), does cache lookup, rate-limit coordination and retries.
Question routing (graph-first): backend/app/services/question_router.py — regex classifier for IMPACT/STRUCTURAL/SEMANTIC questions.

End-to-end Flows

POST /api/v1/analyze (JSON or file upload)
1. Detect language → chunk code into logical CodeChunks.
2. Build dependency graph (cost $0).
3. Query knowledge graph for memory contexts and boosts.
4. Optionally run PrefilterService to select top candidates.
5. AnalysisPipeline.analyze_all() performs a batch cache lookup and calls the LLM only for cache misses; each chunk runs three parallel tasks.
6. Persist results to cache and update the knowledge graph.
POST /api/v1/ask (graph-first, AI-last)
1. Classify the question with regex rules (QuestionRouter).
2. IMPACT / STRUCTURAL → answered from DependencyGraphService (zero AI cost).
3. SEMANTIC → prepare graph context; filter relevant chunks from the graph and call AI fallback (limited context) only when required.

API Endpoints (high level)

GET /health — health and LLM provider info.
POST /api/v1/analyze/ — analyze code (JSON body).
POST /api/v1/analyze/upload — analyze uploaded file (multipart/form-data).
GET /api/v1/analyze/memory/stats — knowledge graph stats.
GET /api/v1/analyze/global_graph — snapshot of accumulated global dependency graph.
POST /api/v1/ask/ — graph-first Q&A endpoint.
POST /api/v1/ask/scan — build the dependency graph from provided code (no AI).

Quickstart (local)

Backend (recommended for development):

bash cd backend python -m venv .venv source .venv/bin/activate pip install -r requirements.txt cp env.example .env

Edit .env: set API keys (OPENAI_API_KEY / ANTHROPIC_API_KEY / GOOGLE_API_KEY), REDIS_URL, etc.

uvicorn app.main:app --reload --port 8000

Using Docker / docker-compose (root compose attempts to bring services together):

bash docker-compose up --build

or to run backend-only if provided:

docker-compose -f backend/docker-compose.yml up --build

Run tests (backend):

bash cd backend pytest -q

Configuration (important env variables)

LLM_PROVIDER (or llm_provider in settings): openai, anthropic, or google.
LLM_MODEL (or llm_model): e.g. gpt-4o, claude-3-5-sonnet-20241022, gemini-2.0-flash.
OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY — set the provider key you use.
REDIS_URL — Redis connection for caching (fallback to no-cache if unavailable).
CACHE_TTL_SECONDS, PREFILTER_MAX_CHUNKS, PREFILTER_MIN_SCORE, PIPELINE_MAX_CONCURRENCY, RATE_LIMIT_* — tunables in backend/app/core/config.py.

Refer to backend/env.example for a ready list of values.

Provider & Privacy Notes

The code treats LLM providers as interchangeable backends; you can switch via settings.
For minimal data retention with OpenAI, consider disabling server-side storage and using the provider options (e.g., store=False where supported) and ensure your org has Zero Data Retention enabled.
For Google, prefer Vertex AI over AI Studio for stronger DPA guarantees.

Developer Notes & Next Steps

Enable tree-sitter (tree-sitter-languages) for more reliable chunking in production.
Move knowledge_graph to a small persistent DB if you need concurrency/scale beyond a JSON file.
Improve prefilter tuning and add metrics (observability) for LLM cost tracking.
Add more frontend integration tests and tighten the CORS policy for production.

Key files

Entry / lifecycle: backend/app/main.py
Chunking: backend/app/services/chunker.py
Pipeline (LLM orchestration): backend/app/services/pipeline.py
Dependency graph: backend/app/services/dependency_graph.py
Knowledge graph: backend/app/services/knowledge_graph.py
Prefilter: backend/app/services/prefilter.py
Cache (Redis): backend/app/services/cache.py
Question router: backend/app/services/question_router.py

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
Corectare.jpg		Corectare.jpg
README.md		README.md
docker-compose.yml		docker-compose.yml
stop.sh		stop.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Legacy Code Analyzer

Features

Quick Architecture Overview

End-to-end Flows

API Endpoints (high level)

Quickstart (local)

Edit .env: set API keys (OPENAI_API_KEY / ANTHROPIC_API_KEY / GOOGLE_API_KEY), REDIS_URL, etc.

or to run backend-only if provided:

Configuration (important env variables)

Provider & Privacy Notes

Developer Notes & Next Steps

Key files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Legacy Code Analyzer

Features

Quick Architecture Overview

End-to-end Flows

API Endpoints (high level)

Quickstart (local)

Edit .env: set API keys (OPENAI_API_KEY / ANTHROPIC_API_KEY / GOOGLE_API_KEY), REDIS_URL, etc.

or to run backend-only if provided:

Configuration (important env variables)

Provider & Privacy Notes

Developer Notes & Next Steps

Key files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages