Autonomous AI assistant with persistent memory, background cognition, and a multi-agent architecture.
Lethe runs 24/7 in an isolated container, communicates via Telegram, remembers your preferences and projects across sessions, and thinks in the background even when you're not talking to it. It uses tools, browses files, searches the web, and delegates focused work to subagents.
curl -fsSL https://lethe.gg/install | bashThe installer sets up ~/.lethe as the runtime root, builds an isolated container (podman on Linux, apple/container on macOS), and walks you through provider selection and Telegram bot setup. If an existing workspace is detected, you can reuse it without reconfiguring.
Prerequisites: A Telegram bot token and an LLM provider (Anthropic subscription, OpenRouter API key, OpenAI, or a local server). The installer handles all other dependencies.
For native (non-containerized) install: curl -fsSL https://lethe.gg/install | bash -s -- --yolo
git clone https://github.com/atemerev/lethe.git
cd lethe
uv sync
cp .env.example .env # edit with your credentials
uv run lethecurl -fsSL https://lethe.gg/update | bash
curl -fsSL https://lethe.gg/uninstall | bashTelegram
|
v
Cortex (principal actor, user-facing)
|
Brainstem (supervision, health checks)
|
+--- DMN (background cognition, hourly)
+--- Hippocampus (recall + salience tagging)
+--- Subagents (spawned on demand)
+--- Tools (CLI, files, web, browser)
|
Actor Registry + Event Bus
|
Memory
+-- workspace/memory/*.md (identity, human, project)
+-- notes/ (skills, conventions, persistent knowledge)
+-- LanceDB (archival memory + message history)
| Actor | Role |
|---|---|
| Cortex | Principal actor. Only actor that talks to the user. Handles quick tasks directly, delegates complex work to subagents. |
| Brainstem | Boot supervisor. Checks resources, releases, reports findings to cortex. |
| DMN | Background cognition on an hourly cadence. Scans goals, updates state, writes reflections, escalates insights. |
| Hippocampus | Autoassociative recall and salience tagging. Searches notes, archival memory, and conversation history on each message. |
| Subagents | Spawned on demand for focused tasks. Report to parent actors. No direct user channel. |
System prompt content is split by update lifecycle:
| Content | Location | Updates |
|---|---|---|
| Persona (identity, character) | workspace/memory/identity.md |
User-editable. Never overwritten. |
| System instructions | config/prompts/agent_instructions.md |
Current after git pull. |
| Tools documentation | config/prompts/agent_tools.md |
Current after git pull. |
| Actor rules | config/prompts/actor_*.md |
Current after git pull. |
Lethe runs in an isolated container by default:
- Linux: Podman rootless container with volume-mounted access only to
~/.letheand directories you choose during install. - macOS: apple/container (macOS 26+), or Podman as fallback for Intel Macs / older macOS.
Native mode (--yolo) runs without isolation — use at your own risk.
The API server binds to 127.0.0.1 by default. Use a reverse proxy for remote access.
| Provider | Auth | Example model |
|---|---|---|
| Anthropic (subscription) | ANTHROPIC_AUTH_TOKEN |
claude-opus-4-6 |
| Anthropic (API key) | ANTHROPIC_API_KEY |
claude-opus-4-6 |
| OpenRouter | OPENROUTER_API_KEY |
openrouter/moonshotai/kimi-k2.6 |
| OpenAI (API key) | OPENAI_API_KEY |
gpt-5.4 |
| OpenAI (subscription) | OPENAI_AUTH_TOKEN |
gpt-5.4 |
| Local (llama.cpp) | LLM_API_BASE + OPENAI_API_KEY=local |
openai/gemma-4-31B-it-Q8_0.gguf |
Set LLM_MODEL explicitly. The installer writes a default for the chosen provider; manual installs must set it in .env.
Multi-model support:
LLM_MODEL_AUX-- summarization, hippocampus, lightweight background workLLM_MODEL_DMN-- DMN model override (defaults to main model)
Tagged markdown files in ~/.lethe/workspace/notes/:
notes/
unige_email_via_graph_api.md # tags: [skill, email, graph-api]
use_uv_not_pip.md # tags: [convention, python]
phd_defense_requirements.md # tags: [education, PhD]
Skills, conventions, and durable procedures. Searched by hippocampus on each message. Auto-extracted from archival memory by the curator.
Always in context. Stored in workspace/memory/:
identity.md-- agent persona (user-customizable, never overwritten)human.md-- what the agent knows about youproject.md-- current project context (agent-maintained)
Long-term semantic storage with hybrid search (vector + full-text). The curator runs on startup to extract valuable entries into notes.
Full conversation history, stored locally in LanceDB. Searchable via conversation_search tool.
Lethe runs well with Gemma 4 31B on consumer GPUs via llama.cpp.
# Build llama.cpp with CUDA
git clone https://github.com/ggml-org/llama.cpp.git && cd llama.cpp
cmake -B build -DGGML_CUDA=ON -DGGML_CUDA_FA_ALL_QUANTS=ON
cmake --build build --target llama-server -j$(nproc)
# Start the server (4x RTX 4090 example)
./build/bin/llama-server \
--model /path/to/gemma-4-31B-it-Q8_0.gguf \
--host 0.0.0.0 --port 8090 \
--n-gpu-layers 999 --split-mode tensor \
--ctx-size 98304 --flash-attn on \
--parallel 2 --cache-ram 32768 \
--jinja --reasoning-budget 4096 \
--spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64 \
-fit offConfigure Lethe:
# .env
LLM_PROVIDER=openai
LLM_MODEL=openai/gemma-4-31B-it-Q8_0.gguf
LLM_API_BASE=http://localhost:8090/v1
LLM_CONTEXT_LIMIT=96000
OPENAI_API_KEY=localKey flags: --split-mode tensor for true tensor parallelism across GPUs, --jinja for native tool calling, --reasoning-budget 4096 for thinking mode.
| Variable | Description | Default |
|---|---|---|
TELEGRAM_BOT_TOKEN |
Bot token from BotFather | required |
TELEGRAM_ALLOWED_USER_IDS |
Comma-separated user IDs | all |
LLM_PROVIDER |
Force provider | auto-detect |
LLM_MODEL |
Main model | required |
LLM_MODEL_AUX |
Aux model | same as main |
LLM_MODEL_DMN |
DMN model override | same as main |
LLM_API_BASE |
Custom API URL | -- |
LLM_CONTEXT_LIMIT |
Context window size | 100000 |
EXA_API_KEY |
Exa web search | optional |
ACTORS_ENABLED |
Enable actor model | true |
HIPPOCAMPUS_ENABLED |
Enable recall + salience | true |
HEARTBEAT_INTERVAL |
Heartbeat interval (seconds) | 3600 |
HEARTBEAT_ENABLED |
Enable heartbeat loop | true |
PROACTIVE_MAX_PER_DAY |
Proactive message limit | 4 |
PROACTIVE_COOLDOWN_MINUTES |
Min spacing between proactive msgs | 60 |
LETHE_HOME |
Runtime root directory | ~/.lethe |
LETHE_API_HOST |
API server bind address | 127.0.0.1 |
Edit workspace/memory/identity.md to customize personality, purpose, and background. This file is never overwritten by updates.
System instructions (communication style, output format) are in config/prompts/agent_instructions.md.
The installer creates the container and service automatically. Useful commands:
# Linux (podman)
systemctl --user start lethe-container
systemctl --user stop lethe-container
journalctl --user -u lethe-container -f
podman exec -it lethe /bin/bash # shell into container
podman exec -u 0 -it lethe /bin/bash # root shell (install packages)
# macOS (apple/container)
launchctl load ~/Library/LaunchAgents/com.lethe.container.plist
launchctl unload ~/Library/LaunchAgents/com.lethe.container.plist
tail -f ~/.lethe/logs/container.logMount additional directories by editing ~/.lethe/config/mounts.conf and re-running scripts/container-setup.sh.
uv run pytest
uv run pytest tests/test_notes.py -vsrc/lethe/
actor/ -- actor model (cortex, dmn, brainstem, subagents)
agent/ -- runtime orchestration
memory/ -- blocks, LanceDB, notes, hippocampus, curator, LLM client
tools/ -- CLI, files, web, browser, Telegram
telegram/ -- Telegram bot interface
conversation/ -- conversation management
paths.py -- centralized path derivation from LETHE_HOME
api.py -- HTTP API server
main.py -- entry point
config/
blocks/ -- seed memory blocks
prompts/ -- system, actor, and heartbeat prompts
workspace/ -- workspace seed files (copied once on first run)
MIT