Open a browser, describe a fuzzy goal, click Start. RUD iterates code → evaluator → keep/discard until your evaluator says done (or it hits your runtime / iteration ceiling).
A local web console for evaluator-driven autonomous code work with Claude Code, built on top of oh-my-claudecode (OMC).
git clone https://github.com/FutureMLS-Lab/RUD.git
cd RUD
npm install && npm run build
npm link # registers omc-web on PATHcd /your/project # any git repo
omc-web # spawns Claude + opens http://127.0.0.1:3737In the browser:
- + New task — write a fuzzy goal.
- Interview tab — answer Claude's questions (typed answers go straight to the leader Claude, no need to attach to tmux).
- Run tab → Start worker — watch verdicts stream in. When the run finishes,
summary.mdappears in the Summary tab.
Requirements: Node 20+, tmux, git, and the claude CLI on PATH.
Each task lives in .omc/web/tasks/<slug>/ with three editable files:
| file | purpose |
|---|---|
plan.md |
what you want done (mission spec) |
eval.md |
how to grade success (evaluator command + JSON contract) |
summary.md |
auto-generated on completion |
The worker — Claude in a tmux pane RUD spawned for you — edits code, commits,
writes candidate.json. RUD runs your evaluator, keeps or discards based on
{pass, score?}. Stops at max-runtime, max-iterations, abort, or your Stop.
worktree mode (default) runs in <taskRoot>/work/ via git worktree add
on a fresh branch — your original repo is never touched. in-place mode
runs against the original repo (requires clean working tree).
---
evaluator:
command: "pytest -q --json | jq '{pass: (.failed==0), score: (.passed/.total)}'"
format: json
kind: shell # shell (default) | agent | hybrid
keep_policy: score_improvement # or pass_only
---
(free-form description of what the evaluator measures)The command runs in the worktree and must print JSON {"pass": boolean, "score": number?}
to stdout. Don't know what to write? Let deep-interview draft one.
v2 — dual-pane evaluators (omc-web only). Set kind: agent to use a second
Claude TUI as the judge instead of a shell command, or kind: hybrid to run the
shell command first AND have the agent review the result. omc-web auto-spawns
the second pane (omc-web-evaluator-<projectId>); pass --evaluator-cmd to use
a different model (e.g. claude --model haiku --dangerously-skip-permissions).
When in dual-pane mode, the evaluator pane may also append guidance to
<runDir>/steer.md — the supervisor pastes it into the runner pane so the
runner can adjust mid-flight (without breaking the iteration boundary).
omc-web # start (default :3737, auto-browser)
omc-web --port 4040 --no-open # custom port, no browser
omc-web --leader-pane %3 # reuse an existing tmux pane instead of spawning
omc-web stop # stop the server (and leave the leader Claude running)
# Headless / scripted (same engine as the web button)
omc autoresearch-run start --task-dir .omc/web/tasks/<slug> \
--mode worktree --max-runtime 2h --max-iterations 10
omc autoresearch-run stop --task-dir .omc/web/tasks/<slug>- DETAILS.md — architecture, full command reference, layout on disk, troubleshooting.
- OMC-README.md — upstream oh-my-claudecode docs (orchestration modes, agent catalog, etc.).
- License: MIT.
Built on oh-my-claudecode by @Yeachan-Heo.