Skip to content

FutureMLS-Lab/RUD

Repository files navigation

RUD — Repeat Until Done

Open a browser, describe a fuzzy goal, click Start. RUD iterates code → evaluator → keep/discard until your evaluator says done (or it hits your runtime / iteration ceiling).

A local web console for evaluator-driven autonomous code work with Claude Code, built on top of oh-my-claudecode (OMC).


Quick start

git clone https://github.com/FutureMLS-Lab/RUD.git
cd RUD
npm install && npm run build
npm link                          # registers omc-web on PATH
cd /your/project                  # any git repo
omc-web                           # spawns Claude + opens http://127.0.0.1:3737

In the browser:

  1. + New task — write a fuzzy goal.
  2. Interview tab — answer Claude's questions (typed answers go straight to the leader Claude, no need to attach to tmux).
  3. Run tab → Start worker — watch verdicts stream in. When the run finishes, summary.md appears in the Summary tab.

Requirements: Node 20+, tmux, git, and the claude CLI on PATH.


How it works

Each task lives in .omc/web/tasks/<slug>/ with three editable files:

file purpose
plan.md what you want done (mission spec)
eval.md how to grade success (evaluator command + JSON contract)
summary.md auto-generated on completion

The worker — Claude in a tmux pane RUD spawned for you — edits code, commits, writes candidate.json. RUD runs your evaluator, keeps or discards based on {pass, score?}. Stops at max-runtime, max-iterations, abort, or your Stop.

worktree mode (default) runs in <taskRoot>/work/ via git worktree add on a fresh branch — your original repo is never touched. in-place mode runs against the original repo (requires clean working tree).

eval.md contract

---
evaluator:
  command: "pytest -q --json | jq '{pass: (.failed==0), score: (.passed/.total)}'"
  format: json
  kind: shell                         # shell (default) | agent | hybrid
  keep_policy: score_improvement      # or pass_only
---

(free-form description of what the evaluator measures)

The command runs in the worktree and must print JSON {"pass": boolean, "score": number?} to stdout. Don't know what to write? Let deep-interview draft one.

v2 — dual-pane evaluators (omc-web only). Set kind: agent to use a second Claude TUI as the judge instead of a shell command, or kind: hybrid to run the shell command first AND have the agent review the result. omc-web auto-spawns the second pane (omc-web-evaluator-<projectId>); pass --evaluator-cmd to use a different model (e.g. claude --model haiku --dangerously-skip-permissions).

When in dual-pane mode, the evaluator pane may also append guidance to <runDir>/steer.md — the supervisor pastes it into the runner pane so the runner can adjust mid-flight (without breaking the iteration boundary).


Common commands

omc-web                           # start (default :3737, auto-browser)
omc-web --port 4040 --no-open     # custom port, no browser
omc-web --leader-pane %3          # reuse an existing tmux pane instead of spawning
omc-web stop                      # stop the server (and leave the leader Claude running)

# Headless / scripted (same engine as the web button)
omc autoresearch-run start --task-dir .omc/web/tasks/<slug> \
    --mode worktree --max-runtime 2h --max-iterations 10
omc autoresearch-run stop --task-dir .omc/web/tasks/<slug>

More

  • DETAILS.md — architecture, full command reference, layout on disk, troubleshooting.
  • OMC-README.md — upstream oh-my-claudecode docs (orchestration modes, agent catalog, etc.).
  • License: MIT.

Built on oh-my-claudecode by @Yeachan-Heo.

About

Repeat Until Done

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors