agent6

A sandboxed coding agent for Linux. The LLM is treated as adversarial: every command it spawns runs inside a custom Rust launcher (agent6-jail) built on user namespaces, Landlock, seccomp, pivot_root, capset(0), and NO_NEW_PRIVS, so a misbehaving model cannot escape the workspace, reach the network beyond the provider endpoint, or corrupt git history.

Features:

Sandboxed execution for every LLM-chosen child process (verify commands, metric commands, optional shell)
Works with Anthropic and any OpenAI-compatible endpoint (OpenAI, OpenRouter, Ollama, vLLM, llama.cpp, LM Studio), tuned to stay effective on cheap open-weights models
Per-step git commits, snapshot-resumable runs, USD and token budgets with hard stops
Plan, run, review, and ask modes; a live terminal dashboard; persistent transcripts and a searchable run history
State machines (agent6 machine) for long-running automated tasks: LLM-drafted, operator-reviewed, journaled, and replayable
Small, fixed LLM tool surface; the only extension point is operator-configured MCP servers, off by default
Eight runtime dependencies, no telemetry, no auto-update

Requirements

Linux for the sandbox. It uses Linux-only kernel APIs (Landlock, seccomp, user namespaces). macOS and Windows run unsandboxed: the default profile = "auto" resolves to none, child commands run as plain subprocesses behind a startup warning, and an explicit profile = "strict" or "hardened" is refused. Run on Linux for kernel-enforced isolation.
Kernel 6.7 or newer for Landlock TCP rules. Older kernels fall back to filesystem-only Landlock with a warning.
kernel.unprivileged_userns_clone = 1 for the strict profile (default on Ubuntu, Debian, and most cloud images); without it agent6 falls back to hardened. On Ubuntu 24.04+ with kernel.apparmor_restrict_unprivileged_userns = 1, install the bundled AppArmor profile (packaging/apparmor/agent6-jail; agent6 check sandbox prints the commands) or set that sysctl to 0.
Python 3.12 or newer, plus an API key for at least one provider.
Building from source needs a Rust toolchain on PATH; PyPI wheels bundle a prebuilt agent6-jail.

Install

From PyPI with uv or pipx:

uv tool install agent6
pipx install agent6

Both drop the agent6 entry point in ~/.local/bin; if that is not on your PATH, run uv tool update-shell or pipx ensurepath and restart your shell.

From source:

git clone https://github.com/elesiuta/agent6
cd agent6
uv sync
uv run agent6 --help

AGENT6_JAIL_BIN=/path/to/agent6-jail overrides the bundled jail binary.

Shell tab-completion

Via argcomplete:

# Bash / Zsh
eval "$(register-python-argcomplete agent6)"

# Fish
register-python-argcomplete --shell fish agent6 > ~/.config/fish/completions/agent6.fish

Quick start

# Connect a provider once (stored in ~/.config/agent6/, key in a 0600
# secrets file). Works across every repo.
agent6 connect                # interactive: pick provider, paste API key
agent6 model worker anthropic claude-sonnet-4-5

# Run the agent on a task -- that's it. agent6 infers a verify command for
# the repo if you haven't set one.
agent6 run "add a --json output mode to the CLI"

# Optional: a granular setup wizard (per-repo config, a pinned verify
# command, .gitignore, AGENTS.md). Safe to run anytime; never overwrites.
agent6 init

# Audit the effective config: every value and where it came from.
agent6 config show

# Pre-flight: sandbox + config + provider keys.
agent6 check

# Resume an interrupted run from its last tool-call snapshot.
agent6 resume <run-id>

# Read-only code review of a diff. Never touches the worktree.
agent6 review --base origin/main --head HEAD

# Adversarial review PANEL: N grounded reviewers (findings grounded against the
# diff, so only real, block-eligible problems gate). Also runs in-loop.
agent6 review --reviewers 3 --personas security,correctness,tests

# Pick a strategy preset with one knob (quick / standard / ultra / paranoid).
agent6 run "..." --profile ultra

The in-loop review panel and profiles are configured under [workflow] (critic, review_*, profile); see CONFIG.md.

Config is layered: built-in secure defaults, then the global ~/.config/agent6/config.toml, then the per-repo config (out of the workspace under $XDG_STATE_HOME/agent6/<repo-id>/config.toml), then an explicit --config FILE. The per-repo config is per-machine, not committed. A repo can be zero-config when the global config supplies a provider and model. The verify command (agent6's success gate) is optional: if a repo hasn't set workflow.verify_command, agent6 run/plan infer one per run (from AGENTS.md, then repo manifests, then a cheap model call) and print what they picked; with none inferable the run proceeds gateless (per-step commits, no green gate). Pin one in the per-repo config — or via agent6 init — to make it deterministic.

Other commands:

agent6 watch [<run-id>]: attach the live TUI to an existing run.
agent6 status [<run-id>]: one-shot liveness + progress of a run (running / crashed / finished, current iteration, last activity, elapsed), then exit — a quick or scripted check (--json) without the live follower.
agent6 plan "<task>": read-only planning pass; execute with agent6 run --from-plan.
agent6 ask "<question>": read-only Q&A over the repo, including questions about agent6 itself (it consults its bundled docs). Seed context with @path or --file; --run <id> asks about a prior run.
agent6 memory: persistent agent memory under the per-repo state dir (<state-dir>/<repo-id>/memories/).
agent6 history search <query>: search persisted transcripts.
agent6 history graph [<run-id>]: render the persisted task graph.
agent6 history transcript [<run-id>]: render a run's full LLM conversation (assistant text + every tool call with complete I/O) as Markdown; --json for the raw transcript. The lossless deep-dive, vs the terse logs.jsonl.
agent6 diff [<run-id>]: print the git diff a run produced.
agent6 machine ...: author and run state machines (.asm.toml); see STATE_MACHINES.md.
agent6 mcp serve: expose agent6's tools over MCP (stdio).
agent6 config fill: materialize every effective value into one file.
agent6 config get/set/unset/add/remove <key> [value]: edit a single dotted leaf. Writes go to the global config by default, --repo for the repo, --machine FILE for a machine overlay. Every edit is re-validated and rolled back if invalid.

Configuration

Every field has a default, and security-sensitive fields default to the safe value. The full reference is CONFIG.md; sandbox profiles are explained in SECURITY.md.

[sandbox]
profile = "auto"              # auto | strict | hardened
agent_network = "providers"   # providers | local | open  (agent's LLM egress)
tool_network = "block"        # block | only_explicit_states | allow  (jailed commands)
run_commands = "ask"          # yes | no | ask
protect_git = true            # strict only: re-bind .git read-only in the jail

[git]
require_clean_worktree = true
branch_per_run = true
commit_strategy = "per_step"  # per_step | squash | stage | none
allow_push = false

[workflow]
verify_command = ["uv", "run", "pytest", "-x"]

[budget]
max_input_tokens  = 2000000
max_output_tokens = 200000
# best_effort_usd_limit = 10.0  # optional; see CONFIG.md

[providers.anthropic]
kind = "anthropic"
api_key_env = "ANTHROPIC_API_KEY"

[models.worker]
provider = "anthropic"
model = "claude-sonnet-4-5"

Budget ceilings can be overridden per run: agent6 run --max-usd 5 "...", or --max-input-tokens / --max-output-tokens on run, plan, and resume.

Providers and models

Declare any number of [providers.<name>] blocks, each with kind = "anthropic" or kind = "openai", its own base_url, and api_key_env. Per-provider http_timeout_s (default 600) caps each HTTP call.

agent6 uses three model roles:

Role	Routed by	Used by
`worker`	`[models.worker]`	`agent6 run` / `resume`; drives USD-to-token conversion.
`reviewer`	`[models.reviewer]`	`agent6 review` and the optional in-loop critic.
`planner`	`[models.planner]`	`agent6 plan`. Falls back to `worker` when unset.

agent6 model all <provider> <model> sets every role at once. Each role takes an optional thinking level (off/low/medium/high).

Tool surface

The tools given to the LLM are fixed and declared in one place, src/agent6/tools/schema.py; adding one requires a security review note in the commit message.

Read-only: read_file, list_dir, grep, outline, find_definition, find_references
Edits: apply_edit (structured blocks), apply_patch (unified diff)
Execution with operator-fixed argv: run_verify_command, run_metric_command
Control: finish_run, plus dag_* task-notepad tools backed by the curator subprocess
Conditional: run_command(argv), only exposed when sandbox.run_commands allows it, and always jailed

There is no write_file, shell, or web_fetch.

How it works

agent6 is a single-loop agent: one provider, one model, one message history. The model drives the run by calling tools; the workflow dispatches them, snapshots state before every LLM call (so any run is resumable), commits each step when verify_command passes (or, on a gateless run with no verify command, every editing step), and hard-stops on budget. Module boundaries (cli -> workflows -> agents -> tools -> sandbox) are enforced by tach. See ARCHITECTURE.md for the run/review loops, the curator subprocess, and the on-disk layout.

For security details (threat model, per-layer breakdown, sandbox profiles), see SECURITY.md. Defaults are safe: agent_network = "providers", tool_network = "block", run_commands = "ask", protect_git = true, git.allow_* = false, and git_ops.py refuses push, --force, and history rewrites unconditionally.

Benchmarks

Reproducible harnesses live under bench/:

bench/realworld/: 11 SWE-bench-Lite-style tasks scored by fresh sandboxed verifies on hidden tests. Latest recorded run: agent6 and claude-code both solve 11/11 on the same worker model (claude-sonnet-4-5); agent6 at about $2.60 total, claude-code at about $3.96. Single runs, no variance measured; re-run before quoting.
bench/agents/: head-to-head against Claude Code, opencode, and aider on Go and Rust tasks with cheap models.
bench/machine/: machine create attempts, cost, and validation results.
bench/perf/: a perf-optimization harness for local experimentation; single-run numbers are too noisy to quote.

Cost accounting

Every run ends with a per-model token and cost summary. Model prices are fetched from the provider's models endpoint and cached (OpenRouter publishes them; Anthropic does not, so its models report an unknown price). Where the provider reports per-call cost, that figure is used directly. The [budget] ceilings hard-stop the run; a stopped run is resumable.

Live view

With stdout a TTY, agent6 run opens a terminal dashboard (task DAG, budget bar, tool table, live reasoning pane, log tail, latest diff); --no-tui and -i (stdin REPL) opt out. Approval and Ctrl-C steer prompts appear as modals, with a /dev/tty fallback when no TUI is present. agent6 plan, agent6 ask, and agent6 machine create stream reasoning and answers to the terminal. Attach from another shell with agent6 watch [<run-id>]; agent6 watch --plain is a plain-text tail. The dashboard renders the JSONL event stream at <state-dir>/<repo-id>/runs/<run-id>/logs.jsonl, which is also the contract for external viewers (vocabulary in ARCHITECTURE.md). The inline log pane is a live tail; press l (or pick a run in the hub and press l) for a full-height, scrollable log of the whole run — current or finished.

Persistence

Per-repo state lives out of the workspace under $XDG_STATE_HOME/agent6/<repo-id>/ (override with [agent6].state_dir or AGENT6_STATE_HOME). Each run's state sits under runs/<run-id>/: the append-only task graph, per-call snapshots that drive agent6 resume, full transcripts, and the event log. The graph-curator subprocess owns the task graph; the main process writes the resume snapshots, transcripts, and event log in-process. The run directory is safe from jailed commands because it lives outside the repo cwd they run on, not because of any single writer.

End-of-run notify hook

If [notify] declares on_complete = [...], agent6 runs that argv after every agent6 run / resume with AGENT6_RUN_ID, AGENT6_RUN_DIR, AGENT6_RUN_OK, and AGENT6_RUN_REASON set. The hook runs outside the jail as your user; the argv is operator-controlled.

Contributing

Read AGENTS.md first. The repo's verify command decides whether a PR is landable:

uv run ruff check && uv run ruff format --check && \
  uv run pyright && uv run tach check && uv run pytest

Changes under sandbox/, tools/, git_ops.py, providers/, or graph/curator must include a security review note in the commit message.

License

Apache-2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
bench		bench
docs		docs
packaging/apparmor		packaging/apparmor
src/agent6		src/agent6
tests		tests
vscode		vscode
.gitignore		.gitignore
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
CONFIG.md		CONFIG.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
STATE_MACHINES.md		STATE_MACHINES.md
hatch_build.py		hatch_build.py
pyproject.toml		pyproject.toml
tach.toml		tach.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent6

Requirements

Install

Shell tab-completion

Quick start

Configuration

Providers and models

Tool surface

How it works

Benchmarks

Cost accounting

Live view

Persistence

End-of-run notify hook

Contributing

License

About

Uh oh!

Releases 12

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent6

Requirements

Install

Shell tab-completion

Quick start

Configuration

Providers and models

Tool surface

How it works

Benchmarks

Cost accounting

Live view

Persistence

End-of-run notify hook

Contributing

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages