feat: opt-in --selector review mode (in-session classify + adjudicate, zero spawns)#51
Merged
Conversation
…cate, zero sub-agent spawns)
…never the rule-id name
… released section
…o, resolve validator path via plugin-root chain, mark canonical/sibling selector sections
There was a problem hiding this comment.
Review Summary
| Step | Status |
|---|---|
| Worktree setup | ALREADY_AT_HEAD — used cwd as REVIEW_DIR |
| Diff | 5 files / +382 lines: CHANGELOG.md, commands/code-review.md, commands/pr-review.md, specs/fixtures/golden-legacy-verdict.json, specs/selector-mode-classify-adjudicate.md |
| Project detection | Plugin repo (no go.mod / pyproject.toml); review covers markdown+specs only |
| Mechanical funnel (Step 4a) | unavailable in this environment — scripts/ast-grep-runner.sh was blocked from execution by the Bash sandbox; all changed files are markdown/specs (no .go / .py surface), so the YAML funnel would have zero hits anyway. Proceeded to Step 4b with judgment-rule triggers only. |
| Active judgment rules (Step 4b-i) | agent-cmd/command-frontmatter (slash-command-auditor), agent-cmd/command-thin (slash-command-auditor), changelog/conventional-prefix-required (agent-auditor) |
| Owners spawned | coding:slash-command-auditor, coding:agent-auditor (concurrency = 2) |
| Owner reports | Both: zero findings. Frontmatter is unchanged; new body prose follows the existing dispatcher pattern. The feat: prefix matches the changelog rule's allowed set. |
| Citation validation (Step 4d) | Skipped — no agent findings to validate. Manual check: all 13 rule_ids in the golden baseline resolve in rules/index.json; the new selectors do not introduce any new rule_ids. |
| precommit (Step 3b) | Skipped (standard mode) — CI covers lint+test |
Manual Review Notes (diff-only)
- The selector section is duplicated across both command files with explicit
CANONICAL/SIBLINGHTML comments and a "diff to verify" instruction. Sibling drift is structural, not introduced by this PR. - The
HARD INVARIANTand the architecture-tier bypass are well-documented; the LLM-discipline surface is acknowledged in the spec (non-goal: no machine check before adjudication). - The golden baseline is checked in at
specs/fixtures/golden-legacy-verdict.json; all 13rule_ids are valid againstrules/index.json(citation-validated by hand). - The 9 plan concerns from the task were reviewed; 4 are addressed by the spec's prose and 5 are explicitly deferred to the default-flip migration step (sibling extraction, schema validation of the golden, tunable batching, etc.).
{
"verdict": "approve",
"summary": "Opt-in `--selector` mode is well-bounded: default behavior is byte-for-byte preserved, mechanical funnel and judgment-rule dispatch still run, golden baseline is committed and citation-validates against `rules/index.json`, and both per-owner adjudicators (slash-command-auditor, agent-auditor) returned zero findings. Optional nits flagged below; none rise to blocking.",
"comments": [
{
"file": "commands/code-review.md",
"line": 189,
"severity": "nit",
"message": "SIBLING step 4c-sel Input says \"diff from the current working tree (`git diff HEAD~1`)\" but the prose two lines above also says \"the working-tree diff from `git diff HEAD~1` (or the directory diff as parsed in Step 1)\". The canonical in pr-review.md has no \"or\" ambiguity — it names a single source (the Step 0c diff). Consider dropping the parenthetical so the sibling is unambiguous."
},
{
"file": "commands/pr-review.md",
"line": 236,
"severity": "nit",
"message": "Short-circuit sentinel `selector clean — no adjudication needed` is encoded with a literal em-dash (U+2014). AC6's grep and the spec's verification block both depend on this exact glyph; any editor that normalizes em-dash to ASCII `--` will silently break AC6 while leaving the feature working. Consider switching to ASCII `--` or quoting the literal in a code fence inside the verification block."
},
{
"file": "commands/code-review.md",
"line": 207,
"severity": "nit",
"message": "Same em-dash fragility as the canonical — the SIBLING copy mirrors the literal U+2014 character. Drift between the two files on this character will be invisible to diff tooling and will fail AC6 differently in each file."
},
{
"file": "README.md",
"line": 0,
"severity": "nit",
"message": "`--selector` mode is documented inside the two command files but not surfaced in `README.md` (commands table) or `llms.txt`. The spec does not require it; adding a one-line mention under each command's row would make the opt-in discoverable without reading the dispatcher prose."
}
],
"concerns_addressed": [
"correctness: HARD INVARIANT `applicable ⊆ candidates` is documented in the prose at commands/pr-review.md:230 and commands/code-review.md:201 and the spec names it as a structural guarantee; not addressed mechanically (acknowledged non-goal)",
"correctness: architecture-tier bypass SRP/layering clause is descriptive prose in both files — the LLM must infer from the rule's doc_path heading; spec marks this as the intended contract (acknowledged)",
"correctness: sibling drift is bounded by CANONICAL/SIBLING HTML comments at commands/pr-review.md:216 and commands/code-review.md:185, plus a 'diff to verify' instruction; single-file extraction is explicitly deferred to the default-flip migration step (addressed in spec, raised as nit on the Input prose)",
"correctness: mode argument parsing accepts both `--selector` and `selector` in commands/pr-review.md:83 and commands/code-review.md:24 (the spec is silent on partial-token handling, but exact-match is the natural read of the prose)",
"correctness: em-dash in short-circuit string — raised as nit (no change to the literal)",
"tests: no new acceptance scenario — addressed by spec's Non-Goals and the AC3 contract that default-mode scenarios 002/004 stay green",
"tests: golden baseline has no schema validation — addressed by spec's Constraints section and the deterministic severity map; raised as a gap the default-flip step will close",
"correctness: >20-rule batching split is hand-waved in prose (commands/pr-review.md:265 / commands/code-review.md:236) — spec marks a tunable threshold as a non-goal (deferred)",
"security: hallucinated file:line risk — the spec's Trust-boundary section already names `validate-citations.sh` as the gating check (same risk as legacy path); spec is silent on a post-validation sanity check (deferred)",
"performance: 3m24s / $1.18 claim is based on a single fixture run — spec marks further measurement as a follow-up; raised as a gap the default-flip step will close"
]
}There was a problem hiding this comment.
I have the agent reports. Let me consolidate the findings and produce the final report.
The findings are:
- agent-auditor (8 active rules): All PASS — no violations
- slash-command-auditor (2 active rules): 1 Should Fix + 2 Must Fix + 1 Should Fix
The two "Must Fix" findings from slash-command-auditor concern extracting inline selector logic to a new agent. The spec explicitly documents that selector mode runs in-session with zero spawns by design — extracting to a sub-agent would change the semantics. The spec also acknowledges the duplication is a known limitation with a planned cleanup. The argument-hint issue is a real, actionable bug.
Per the procedure's deterministic map, any Must Fix → "request-changes". I'll produce the verdict accordingly.
{
"verdict": "request-changes",
"summary": "Documentation-only change adding the opt-in `--selector` mode to the two review commands plus a dark-factory spec and golden fixture. Mechanical funnel was unavailable in this environment, so judgment-only adjudication ran. agent-auditor found zero violations across 8 active rules. slash-command-auditor found a real `argument-hint` documentation drift (the new `--selector` mode token is not listed in either command file's hint) and flagged that Steps 4c-sel/4d-sel inline business logic that `command-thin` would normally push to an agent — this is by design (the spec mandates zero sub-agent spawns in selector mode and acknowledges the duplication with a planned cleanup), but the argument-hint bug is straightforward to fix.",
"comments": [
{
"file": "commands/code-review.md",
"line": 3,
"severity": "major",
"message": "[agent-cmd/command-frontmatter] `argument-hint` is `\"[short|standard|full] [directory]\"` but the new `--selector` / `selector` mode token (parsed at line 23) is missing. Update to `\"[short|standard|full|selector] [directory]\"`. Rule agent-cmd/command-frontmatter (MUST)."
},
{
"file": "commands/pr-review.md",
"line": 3,
"severity": "major",
"message": "[agent-cmd/command-frontmatter] `argument-hint` is `\"<target-branch> [short|standard|full]\"` but the new `--selector` / `selector` mode token (parsed at line 83) is missing. Update to `\"<target-branch> [short|standard|full|selector]\"`. Rule agent-cmd/command-frontmatter (MUST)."
},
{
"file": "commands/code-review.md",
"line": 183,
"severity": "critical",
"message": "[agent-cmd/command-thin] Steps 4c-sel/4d-sel inline six categories of business logic in the command: classify contract (recall-optimized decision policy), skip-justification rule (justify against `applies_when`, never the rule-id name), architecture-tier bypass, adjudication severity rubric, batching threshold (>20 rules splits into 2-3 passes), and short-circuit contract. The spec mandates zero sub-agent spawns in selector mode, so the in-session prose is the design intent — but the rule's intent is to keep commands as thin dispatchers, and the pre-existing files were already over 100 lines before this 70-line addition (now 317). The spec acknowledges the duplication: \"Single-file extraction is planned for the default-flip migration step.\" Either extract the logic to a `coding:selector-classifier` agent (changes the in-session semantic — may be acceptable for the opt-in path) or amend `agent-cmd/command-thin` to carve out an exception for opt-in in-session algorithms. Rule agent-cmd/command-thin (MUST)."
},
{
"file": "commands/pr-review.md",
"line": 214,
"severity": "critical",
"message": "[agent-cmd/command-thin] Same finding as code-review.md:184 — the canonical selector section duplicates six categories of business logic that `command-thin` says belong in an agent. pr-review.md is now 334 lines (was 267); the selector section accounts for ~20%. The spec's Non-goals explicitly preserve this in this migration step; tracked for the default-flip cleanup. Rule agent-cmd/command-thin (MUST)."
},
{
"file": "commands/code-review.md",
"line": 185,
"severity": "major",
"message": "[agent-cmd/single-source-of-truth] The selector section is duplicated across the two command files (~60 lines each), maintained by comment instructions (\"edit there first, mirror here, diff to verify\"). The HTML comment is a workaround that does not actually enforce the invariant — drift would only be caught at PR review. A single-source solution (extracted agent, or a single include) would remove the maintenance burden. Rule agent-cmd/single-source-of-truth (SHOULD)."
}
],
"concerns_addressed": [
"correctness: in-session classify + adjudicate path preserves dedup/confidence semantics — addressed structurally by the narrowing invariant (`applicable ⊆ candidates`); cited in spec lines 23, 49, and in commands/pr-review.md:228 / commands/code-review.md:198",
"correctness: --selector option short-circuits spawns only when explicitly enabled — addressed: parser at commands/pr-review.md:83 and commands/code-review.md:23 recognizes the token; default path is byte-for-byte unchanged per the spec's 'MUST remain byte-for-byte as today' constraint",
"correctness: new types/constants for selector mode are additive — no JSON/markdown report schema breakage (the selector path reuses the existing Must/Should/Nice buckets per the spec's severity vocabulary constraint)",
"tests: regression coverage on the existing spawn path — addressed via spec AC3 (default-mode scenarios 002/004 unchanged, `make check-acceptance` exits 0)",
"tests: flag-parsing test for --selector — NO TEST WAS ADDED. The spec mentions manual fixture walk (AC1) and `make precommit` (AC5) but the diff adds no automated test that asserts zero-spawn behavior on --selector. AC2 is verifiable from transcript alone, not from a test fixture. This is a gap relative to the concern.",
"performance: zero-spawn path actually avoids subprocess/agent calls — addressed structurally: spec lines 18-19, commands/pr-review.md:218 and commands/code-review.md:187 both state 'zero sub-agent spawns', and the citation validation explicitly bypasses coding:simple-bash-runner in favor of a plain bash call (commands/pr-review.md:267-272, commands/code-review.md:238-243)"
]
}….md — commands carry thin pointers (single source of truth)
…ING echo + Must Fix stop on missing)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an opt-in
--selectormode to/coding:pr-reviewand/coding:code-reviewthat replaces the per-owner Task fanout (Step 4b-ii) with two in-session steps:applies_when, never the rule-id name). Hard invariant:applicable ⊆ candidates.### RULEblocks, judges the full diff + mechanical findings, emits findings in the existing severity buckets. Batches into 2-3 in-session passes when applicable set > 20.Default mode is byte-for-byte unchanged — pure insertions; selector is opt-in until validated further (later migration step flips the default).
Spec:
specs/selector-mode-classify-adjudicate.md(audited; in this PR). Architecture: AI-Selector Review Redesign (vault).Validation (live end-to-end, isolated CLAUDE_CONFIG_DIR)
Ran
claude -p "/coding:pr-review master selector"against the perpetual fixture bborbe/maintainer#2, diffed against the frozen legacy golden (13 findings from real per-owner dispatch):subagent_typeeventsmake precommitRound 1 caught a real classify false negative (
go-testing/libtime-injection-requiredskipped as "not a test file" — rule-id-prefix inference); fixed by the skip-justification rule (2642626), round 2 clean.Incidental discovery:
ast-grep-runner.shscans.git/COMMIT_EDITMSG(274 stale agent-auditor findings) — follow-up, not addressed here.🤖 Generated with Claude Code