Skip to content

sayan1999/interactive-draft-and-remember-agent

Repository files navigation

Story Drafter — Continuous Personalization Loop via Interactive Grounding

A self-evolving agent sidecar that intercepts LLM requests, runs iterative planning cycles grounded in personal semantic memory, and writes preference facts back to a vector store after every confirmed session — so every future generation is shaped by every past correction.


The Problem with Every LLM Today

Every LLM has the same two failure modes — and they compound each other.

1. It assumes instead of asking. When you send a coding task or a personalized writing request, the model picks the most statistically likely interpretation and commits to it immediately. It doesn't pause to ask whether you want idiomatic or explicit code, terse or expressive prose, a particular structure or tone. It guesses — and you spend the next several turns correcting a direction it should have clarified upfront.

2. It forgets everything you tell it. Every tweak you make in a session — the phrasing you corrected, the structure you rejected, the style you pushed it toward — vanishes the moment the context window closes. There is no persistence within a session beyond the visible thread, no persistence across sessions, and no global memory of who you are or how you work. You re-teach the model the same preferences every time.

Story Drafter breaks both failure modes at the service layer.


What It Does

Story Drafter sits between your frontend and the main LLM as a FastAPI sidecar. It exposes two endpoints:

  • POST /draft — runs a LangGraph pipeline that retrieves personal preference facts from ChromaDB via mem0, assembles a structured context, and generates a draft using a configurable drafter LLM.
  • POST /confirm — receives the final approved draft and the full drafting exchange, extracts new preference facts via a background LLM call, and writes them back to ChromaDB asynchronously.

The client drives an iterative loop: call /draft, collect user feedback, call /draft again with the extended message history — repeat until the draft is approved, then call /confirm. Each iteration appends the previous draft and feedback as new turns, giving the drafter full context of the refinement trajectory.


Why This Architecture Is Different

This is not prompt engineering. This is not fine-tuning. This is a Continuous Personalization Loop via Interactive Grounding — the paradigm emerging at the frontier of agentic AI research:

  • PAHF (Personalized Agents from Human Feedback) — every correction in the drafting loop is a live preference signal. No model retraining. No dataset curation. The memory updates itself from the interaction.
  • Self-Evolving Memory — the agent treats its own vector store as mutable. After each confirmed session, a background extraction step reflects on the full drafting exchange and patches the semantic profile of the user.
  • Interactive Grounding — instead of one-shot generation, the service surfaces candidates and refines them over multiple turns. Each refinement step closes the gap between the model's prior and the user's actual intent — and that delta is what gets persisted.

Architecture

Client (any frontend)
    │
    ├─► POST /draft
    │       │
    │       ▼
    │   LangGraph pipeline
    │       ├─ retrieve_memories(user_id, query)
    │       │       └─► ChromaDB (mem0) — semantic search → ranked preference facts
    │       │
    │       └─ generate_draft(context)
    │               ├─ [system]  agent_prompt + <meta_prompt> blocks
    │               ├─ [user]    <user_preferences> (injected from mem0)
    │               ├─ [user]    <system_prompt> (from messages[0].system)
    │               ├─ [user]    <conversation> (full message history, XML-tagged)
    │               └─ [user]    task_prompt (planning or drafting instruction)
    │
    │   ← { "draft": "..." }
    │
    │   [client appends draft + feedback to messages, loops]
    │
    └─► POST /confirm
            │
            ▼
        FastAPI BackgroundTask
            └─► mem0 LLM — extract discrete preference facts from drafting exchange
                    └─► ChromaDB — upsert facts under user_id
                            └─► available on next /draft call

/draft context assembly

Each call to /draft receives the full message history including all prior draft/feedback turns appended by the client. The LangGraph node assembles:

[system]    agent_prompt
            + <tag>meta_prompt</tag> for each configured meta prompt

[user]      <user_preferences>
              <item>retrieved preference fact</item> ...
            </user_preferences>
            <system_prompt>messages[0].system content</system_prompt>

[user]      <conversation>
              <user>...</user>
              <assistant>...</assistant>
              ...
            </conversation>

[user]      task_prompt

mem0 queries ChromaDB using the last 3 non-system messages as the search vector. A fallback_model is wired via LangChain .with_fallbacks() for resilience.

/confirm memory write-back

The confirm payload contains only the drafting loop exchange (draft_loop_messages + final approved draft) — not the full conversation history. This keeps the mem0 extraction focused on stylistic and preference signals from the refinement session rather than narrative content.

The server returns {"ok": true} immediately. The mem0 LLM call and ChromaDB write happen in a BackgroundTask — the client is never blocked on memory consolidation.


Stack

Layer Technology
Drafter service FastAPI + LangGraph (port 6677)
Preference memory mem0 + ChromaDB (port 8100)
Observability Arize Phoenix (port 6006)
LLMs OpenRouter (any model) · configurable embedding provider

Quickstart

1. Configure

cp .env.example .env
# set OPENROUTER_API_KEY
# set GOOGLE_APPLICATION_CREDENTIALS  (for Google-based embedding model)

2. Start services

docker compose up --build

ChromaDB :8100 · Phoenix :6006 · Drafter :6677

3. Health check

curl http://localhost:6677/health/ready

4. Call the API

# First draft
curl -s -X POST http://localhost:6677/draft \
  -H 'Content-Type: application/json' \
  -d '{
    "messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}],
    "model": "<your-model-id>",
    "agent_prompt": "...",
    "task_prompt": "...",
    "user_id": "user_123"
  }'

# After client collects feedback and appends draft+feedback to messages, call /draft again
# On approval, confirm
curl -s -X POST http://localhost:6677/confirm \
  -H 'Content-Type: application/json' \
  -d '{
    "messages": [...draft loop exchange...],
    "final_draft": "...",
    "feedback_history": ["feedback turn 1", "feedback turn 2"],
    "user_id": "user_123",
    "custom_instructions": "..."
  }'

5. Browse memory

.venv/bin/streamlit run scripts/dashboard.py

Configuration Reference

Variable Default Description
MEM0_LLM_MODEL LLM for preference extraction (any OpenRouter model)
MEM0_EMBED_MODEL Embedding model
CHROMA_HOST localhost ChromaDB host
CHROMA_PORT 8100 ChromaDB port
CHROMA_COLLECTION agent_prefs Collection name

Project Layout

main.py                        FastAPI entry point (port 6677)
drafter/
  agent.py                     LangGraph graph: retrieve_memories → generate_draft
  memory.py                    mem0 wrapper: search_memories(), add_memory()
  llm_utils.py                 make_llm() — OpenRouter and Google Gemini
  models.py                    Pydantic schemas: DraftRequest, DraftResponse, ConfirmRequest
  config.py                    Settings from .env
scripts/
  dashboard.py                 Streamlit memory browser

See ARCHITECTURE.md for the full request/response flow and state machine.


The Bigger Picture

Most "personalized AI" products are personalized at training time — a frozen snapshot of aggregate preferences baked into weights. Story Drafter is personalized at inference time, continuously, from individual corrections made in live sessions.

Every drafting loop tightens the model's prior on a specific user. The preference vector store grows denser with each session. That's not a feature. That's the architecture.

About

A self-evolving agent sidecar for interactive LLM drafting with persistent preference memory — continuous personalization loop via interactive grounding

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors