Story Drafter — Continuous Personalization Loop via Interactive Grounding

A self-evolving agent sidecar that intercepts LLM requests, runs iterative planning cycles grounded in personal semantic memory, and writes preference facts back to a vector store after every confirmed session — so every future generation is shaped by every past correction.

The Problem with Every LLM Today

Every LLM has the same two failure modes — and they compound each other.

1. It assumes instead of asking. When you send a coding task or a personalized writing request, the model picks the most statistically likely interpretation and commits to it immediately. It doesn't pause to ask whether you want idiomatic or explicit code, terse or expressive prose, a particular structure or tone. It guesses — and you spend the next several turns correcting a direction it should have clarified upfront.

2. It forgets everything you tell it. Every tweak you make in a session — the phrasing you corrected, the structure you rejected, the style you pushed it toward — vanishes the moment the context window closes. There is no persistence within a session beyond the visible thread, no persistence across sessions, and no global memory of who you are or how you work. You re-teach the model the same preferences every time.

Story Drafter breaks both failure modes at the service layer.

What It Does

Story Drafter sits between your frontend and the main LLM as a FastAPI sidecar. It exposes two endpoints:

POST /draft — runs a LangGraph pipeline that retrieves personal preference facts from ChromaDB via mem0, assembles a structured context, and generates a draft using a configurable drafter LLM.
POST /confirm — receives the final approved draft and the full drafting exchange, extracts new preference facts via a background LLM call, and writes them back to ChromaDB asynchronously.

The client drives an iterative loop: call /draft, collect user feedback, call /draft again with the extended message history — repeat until the draft is approved, then call /confirm. Each iteration appends the previous draft and feedback as new turns, giving the drafter full context of the refinement trajectory.

Why This Architecture Is Different

This is not prompt engineering. This is not fine-tuning. This is a Continuous Personalization Loop via Interactive Grounding — the paradigm emerging at the frontier of agentic AI research:

PAHF (Personalized Agents from Human Feedback) — every correction in the drafting loop is a live preference signal. No model retraining. No dataset curation. The memory updates itself from the interaction.
Self-Evolving Memory — the agent treats its own vector store as mutable. After each confirmed session, a background extraction step reflects on the full drafting exchange and patches the semantic profile of the user.
Interactive Grounding — instead of one-shot generation, the service surfaces candidates and refines them over multiple turns. Each refinement step closes the gap between the model's prior and the user's actual intent — and that delta is what gets persisted.

Architecture

Client (any frontend)
    │
    ├─► POST /draft
    │       │
    │       ▼
    │   LangGraph pipeline
    │       ├─ retrieve_memories(user_id, query)
    │       │       └─► ChromaDB (mem0) — semantic search → ranked preference facts
    │       │
    │       └─ generate_draft(context)
    │               ├─ [system]  agent_prompt + <meta_prompt> blocks
    │               ├─ [user]    <user_preferences> (injected from mem0)
    │               ├─ [user]    <system_prompt> (from messages[0].system)
    │               ├─ [user]    <conversation> (full message history, XML-tagged)
    │               └─ [user]    task_prompt (planning or drafting instruction)
    │
    │   ← { "draft": "..." }
    │
    │   [client appends draft + feedback to messages, loops]
    │
    └─► POST /confirm
            │
            ▼
        FastAPI BackgroundTask
            └─► mem0 LLM — extract discrete preference facts from drafting exchange
                    └─► ChromaDB — upsert facts under user_id
                            └─► available on next /draft call

`/draft` context assembly

Each call to /draft receives the full message history including all prior draft/feedback turns appended by the client. The LangGraph node assembles:

[system]    agent_prompt
            + <tag>meta_prompt</tag> for each configured meta prompt

[user]      <user_preferences>
              <item>retrieved preference fact</item> ...
            </user_preferences>
            <system_prompt>messages[0].system content</system_prompt>

[user]      <conversation>
              <user>...</user>
              <assistant>...</assistant>
              ...
            </conversation>

[user]      task_prompt

mem0 queries ChromaDB using the last 3 non-system messages as the search vector. A fallback_model is wired via LangChain .with_fallbacks() for resilience.

`/confirm` memory write-back

The confirm payload contains only the drafting loop exchange (draft_loop_messages + final approved draft) — not the full conversation history. This keeps the mem0 extraction focused on stylistic and preference signals from the refinement session rather than narrative content.

The server returns {"ok": true} immediately. The mem0 LLM call and ChromaDB write happen in a BackgroundTask — the client is never blocked on memory consolidation.

Stack

Layer	Technology
Drafter service	FastAPI + LangGraph (port 6677)
Preference memory	mem0 + ChromaDB (port 8100)
Observability	Arize Phoenix (port 6006)
LLMs	OpenRouter (any model) · configurable embedding provider

Quickstart

1. Configure

cp .env.example .env
# set OPENROUTER_API_KEY
# set GOOGLE_APPLICATION_CREDENTIALS  (for Google-based embedding model)

2. Start services

docker compose up --build

ChromaDB :8100 · Phoenix :6006 · Drafter :6677

3. Health check

curl http://localhost:6677/health/ready

4. Call the API

# First draft
curl -s -X POST http://localhost:6677/draft \
  -H 'Content-Type: application/json' \
  -d '{
    "messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}],
    "model": "<your-model-id>",
    "agent_prompt": "...",
    "task_prompt": "...",
    "user_id": "user_123"
  }'

# After client collects feedback and appends draft+feedback to messages, call /draft again
# On approval, confirm
curl -s -X POST http://localhost:6677/confirm \
  -H 'Content-Type: application/json' \
  -d '{
    "messages": [...draft loop exchange...],
    "final_draft": "...",
    "feedback_history": ["feedback turn 1", "feedback turn 2"],
    "user_id": "user_123",
    "custom_instructions": "..."
  }'

5. Browse memory

.venv/bin/streamlit run scripts/dashboard.py

Configuration Reference

Variable	Default	Description
`MEM0_LLM_MODEL`	—	LLM for preference extraction (any OpenRouter model)
`MEM0_EMBED_MODEL`	—	Embedding model
`CHROMA_HOST`	`localhost`	ChromaDB host
`CHROMA_PORT`	`8100`	ChromaDB port
`CHROMA_COLLECTION`	`agent_prefs`	Collection name

Project Layout

main.py                        FastAPI entry point (port 6677)
drafter/
  agent.py                     LangGraph graph: retrieve_memories → generate_draft
  memory.py                    mem0 wrapper: search_memories(), add_memory()
  llm_utils.py                 make_llm() — OpenRouter and Google Gemini
  models.py                    Pydantic schemas: DraftRequest, DraftResponse, ConfirmRequest
  config.py                    Settings from .env
scripts/
  dashboard.py                 Streamlit memory browser

See ARCHITECTURE.md for the full request/response flow and state machine.

The Bigger Picture

Most "personalized AI" products are personalized at training time — a frozen snapshot of aggregate preferences baked into weights. Story Drafter is personalized at inference time, continuously, from individual corrections made in live sessions.

Every drafting loop tightens the model's prior on a specific user. The preference vector store grows denser with each session. That's not a feature. That's the architecture.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
drafter		drafter
prompt_templates		prompt_templates
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Story Drafter — Continuous Personalization Loop via Interactive Grounding

The Problem with Every LLM Today

What It Does

Why This Architecture Is Different

Architecture

`/draft` context assembly

`/confirm` memory write-back

Stack

Quickstart

Configuration Reference

Project Layout

The Bigger Picture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Story Drafter — Continuous Personalization Loop via Interactive Grounding

The Problem with Every LLM Today

What It Does

Why This Architecture Is Different

Architecture

/draft context assembly

/confirm memory write-back

Stack

Quickstart

Configuration Reference

Project Layout

The Bigger Picture

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`/draft` context assembly

`/confirm` memory write-back

Packages