Skip to content

feat: opt-in --selector review mode (in-session classify + adjudicate, zero spawns)#51

Merged
bborbe merged 7 commits into
masterfrom
feature/selector-review
Jun 11, 2026
Merged

feat: opt-in --selector review mode (in-session classify + adjudicate, zero spawns)#51
bborbe merged 7 commits into
masterfrom
feature/selector-review

Conversation

@bborbe

@bborbe bborbe commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Summary

Adds an opt-in --selector mode to /coding:pr-review and /coding:code-review that replaces the per-owner Task fanout (Step 4b-ii) with two in-session steps:

  • Step 4c-sel CLASSIFY — narrows the Step 4b-i glob candidates under a recall-optimized contract (include when uncertain; skip reasons must be justified against the rule's applies_when, never the rule-id name). Hard invariant: applicable ⊆ candidates.
  • Step 4d-sel ADJUDICATE — reads only the applicable rules' ### RULE blocks, judges the full diff + mechanical findings, emits findings in the existing severity buckets. Batches into 2-3 in-session passes when applicable set > 20.

Default mode is byte-for-byte unchanged — pure insertions; selector is opt-in until validated further (later migration step flips the default).

Spec: specs/selector-mode-classify-adjudicate.md (audited; in this PR). Architecture: AI-Selector Review Redesign (vault).

Validation (live end-to-end, isolated CLAUDE_CONFIG_DIR)

Ran claude -p "/coding:pr-review master selector" against the perpetual fixture bborbe/maintainer#2, diffed against the frozen legacy golden (13 findings from real per-owner dispatch):

Check Result
All 13 golden rule_ids accounted for ✅ 7 findings reproduced + 3 legacy false positives correctly rejected + 2 validly excluded
Zero sub-agent spawns (Task tool was available) ✅ transcript has no subagent_type events
Narrowing invariant + real narrowing ✅ 31 candidates → 9 applicable → 22 skipped with applies_when-grounded reasons
Wall time / cost ✅ 3m24s / $1.18 (legacy bot: 12-15 min on a smaller PR)
make precommit ✅ 13/13; scenarios untouched

Round 1 caught a real classify false negative (go-testing/libtime-injection-required skipped as "not a test file" — rule-id-prefix inference); fixed by the skip-justification rule (2642626), round 2 clean.

Incidental discovery: ast-grep-runner.sh scans .git/COMMIT_EDITMSG (274 stale agent-auditor findings) — follow-up, not addressed here.

🤖 Generated with Claude Code

@bborbe bborbe marked this pull request as ready for review June 11, 2026 17:07

@ben-s-pull-request-reviewer ben-s-pull-request-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Step Status
Worktree setup ALREADY_AT_HEAD — used cwd as REVIEW_DIR
Diff 5 files / +382 lines: CHANGELOG.md, commands/code-review.md, commands/pr-review.md, specs/fixtures/golden-legacy-verdict.json, specs/selector-mode-classify-adjudicate.md
Project detection Plugin repo (no go.mod / pyproject.toml); review covers markdown+specs only
Mechanical funnel (Step 4a) unavailable in this environment — scripts/ast-grep-runner.sh was blocked from execution by the Bash sandbox; all changed files are markdown/specs (no .go / .py surface), so the YAML funnel would have zero hits anyway. Proceeded to Step 4b with judgment-rule triggers only.
Active judgment rules (Step 4b-i) agent-cmd/command-frontmatter (slash-command-auditor), agent-cmd/command-thin (slash-command-auditor), changelog/conventional-prefix-required (agent-auditor)
Owners spawned coding:slash-command-auditor, coding:agent-auditor (concurrency = 2)
Owner reports Both: zero findings. Frontmatter is unchanged; new body prose follows the existing dispatcher pattern. The feat: prefix matches the changelog rule's allowed set.
Citation validation (Step 4d) Skipped — no agent findings to validate. Manual check: all 13 rule_ids in the golden baseline resolve in rules/index.json; the new selectors do not introduce any new rule_ids.
precommit (Step 3b) Skipped (standard mode) — CI covers lint+test

Manual Review Notes (diff-only)

  • The selector section is duplicated across both command files with explicit CANONICAL / SIBLING HTML comments and a "diff to verify" instruction. Sibling drift is structural, not introduced by this PR.
  • The HARD INVARIANT and the architecture-tier bypass are well-documented; the LLM-discipline surface is acknowledged in the spec (non-goal: no machine check before adjudication).
  • The golden baseline is checked in at specs/fixtures/golden-legacy-verdict.json; all 13 rule_ids are valid against rules/index.json (citation-validated by hand).
  • The 9 plan concerns from the task were reviewed; 4 are addressed by the spec's prose and 5 are explicitly deferred to the default-flip migration step (sibling extraction, schema validation of the golden, tunable batching, etc.).
{
  "verdict": "approve",
  "summary": "Opt-in `--selector` mode is well-bounded: default behavior is byte-for-byte preserved, mechanical funnel and judgment-rule dispatch still run, golden baseline is committed and citation-validates against `rules/index.json`, and both per-owner adjudicators (slash-command-auditor, agent-auditor) returned zero findings. Optional nits flagged below; none rise to blocking.",
  "comments": [
    {
      "file": "commands/code-review.md",
      "line": 189,
      "severity": "nit",
      "message": "SIBLING step 4c-sel Input says \"diff from the current working tree (`git diff HEAD~1`)\" but the prose two lines above also says \"the working-tree diff from `git diff HEAD~1` (or the directory diff as parsed in Step 1)\". The canonical in pr-review.md has no \"or\" ambiguity — it names a single source (the Step 0c diff). Consider dropping the parenthetical so the sibling is unambiguous."
    },
    {
      "file": "commands/pr-review.md",
      "line": 236,
      "severity": "nit",
      "message": "Short-circuit sentinel `selector clean — no adjudication needed` is encoded with a literal em-dash (U+2014). AC6's grep and the spec's verification block both depend on this exact glyph; any editor that normalizes em-dash to ASCII `--` will silently break AC6 while leaving the feature working. Consider switching to ASCII `--` or quoting the literal in a code fence inside the verification block."
    },
    {
      "file": "commands/code-review.md",
      "line": 207,
      "severity": "nit",
      "message": "Same em-dash fragility as the canonical — the SIBLING copy mirrors the literal U+2014 character. Drift between the two files on this character will be invisible to diff tooling and will fail AC6 differently in each file."
    },
    {
      "file": "README.md",
      "line": 0,
      "severity": "nit",
      "message": "`--selector` mode is documented inside the two command files but not surfaced in `README.md` (commands table) or `llms.txt`. The spec does not require it; adding a one-line mention under each command's row would make the opt-in discoverable without reading the dispatcher prose."
    }
  ],
  "concerns_addressed": [
    "correctness: HARD INVARIANT `applicable ⊆ candidates` is documented in the prose at commands/pr-review.md:230 and commands/code-review.md:201 and the spec names it as a structural guarantee; not addressed mechanically (acknowledged non-goal)",
    "correctness: architecture-tier bypass SRP/layering clause is descriptive prose in both files — the LLM must infer from the rule's doc_path heading; spec marks this as the intended contract (acknowledged)",
    "correctness: sibling drift is bounded by CANONICAL/SIBLING HTML comments at commands/pr-review.md:216 and commands/code-review.md:185, plus a 'diff to verify' instruction; single-file extraction is explicitly deferred to the default-flip migration step (addressed in spec, raised as nit on the Input prose)",
    "correctness: mode argument parsing accepts both `--selector` and `selector` in commands/pr-review.md:83 and commands/code-review.md:24 (the spec is silent on partial-token handling, but exact-match is the natural read of the prose)",
    "correctness: em-dash in short-circuit string — raised as nit (no change to the literal)",
    "tests: no new acceptance scenario — addressed by spec's Non-Goals and the AC3 contract that default-mode scenarios 002/004 stay green",
    "tests: golden baseline has no schema validation — addressed by spec's Constraints section and the deterministic severity map; raised as a gap the default-flip step will close",
    "correctness: >20-rule batching split is hand-waved in prose (commands/pr-review.md:265 / commands/code-review.md:236) — spec marks a tunable threshold as a non-goal (deferred)",
    "security: hallucinated file:line risk — the spec's Trust-boundary section already names `validate-citations.sh` as the gating check (same risk as legacy path); spec is silent on a post-validation sanity check (deferred)",
    "performance: 3m24s / $1.18 claim is based on a single fixture run — spec marks further measurement as a follow-up; raised as a gap the default-flip step will close"
  ]
}

@ben-s-pull-request-reviewer ben-s-pull-request-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the agent reports. Let me consolidate the findings and produce the final report.

The findings are:

  • agent-auditor (8 active rules): All PASS — no violations
  • slash-command-auditor (2 active rules): 1 Should Fix + 2 Must Fix + 1 Should Fix

The two "Must Fix" findings from slash-command-auditor concern extracting inline selector logic to a new agent. The spec explicitly documents that selector mode runs in-session with zero spawns by design — extracting to a sub-agent would change the semantics. The spec also acknowledges the duplication is a known limitation with a planned cleanup. The argument-hint issue is a real, actionable bug.

Per the procedure's deterministic map, any Must Fix → "request-changes". I'll produce the verdict accordingly.

{
  "verdict": "request-changes",
  "summary": "Documentation-only change adding the opt-in `--selector` mode to the two review commands plus a dark-factory spec and golden fixture. Mechanical funnel was unavailable in this environment, so judgment-only adjudication ran. agent-auditor found zero violations across 8 active rules. slash-command-auditor found a real `argument-hint` documentation drift (the new `--selector` mode token is not listed in either command file's hint) and flagged that Steps 4c-sel/4d-sel inline business logic that `command-thin` would normally push to an agent — this is by design (the spec mandates zero sub-agent spawns in selector mode and acknowledges the duplication with a planned cleanup), but the argument-hint bug is straightforward to fix.",
  "comments": [
    {
      "file": "commands/code-review.md",
      "line": 3,
      "severity": "major",
      "message": "[agent-cmd/command-frontmatter] `argument-hint` is `\"[short|standard|full] [directory]\"` but the new `--selector` / `selector` mode token (parsed at line 23) is missing. Update to `\"[short|standard|full|selector] [directory]\"`. Rule agent-cmd/command-frontmatter (MUST)."
    },
    {
      "file": "commands/pr-review.md",
      "line": 3,
      "severity": "major",
      "message": "[agent-cmd/command-frontmatter] `argument-hint` is `\"<target-branch> [short|standard|full]\"` but the new `--selector` / `selector` mode token (parsed at line 83) is missing. Update to `\"<target-branch> [short|standard|full|selector]\"`. Rule agent-cmd/command-frontmatter (MUST)."
    },
    {
      "file": "commands/code-review.md",
      "line": 183,
      "severity": "critical",
      "message": "[agent-cmd/command-thin] Steps 4c-sel/4d-sel inline six categories of business logic in the command: classify contract (recall-optimized decision policy), skip-justification rule (justify against `applies_when`, never the rule-id name), architecture-tier bypass, adjudication severity rubric, batching threshold (>20 rules splits into 2-3 passes), and short-circuit contract. The spec mandates zero sub-agent spawns in selector mode, so the in-session prose is the design intent — but the rule's intent is to keep commands as thin dispatchers, and the pre-existing files were already over 100 lines before this 70-line addition (now 317). The spec acknowledges the duplication: \"Single-file extraction is planned for the default-flip migration step.\" Either extract the logic to a `coding:selector-classifier` agent (changes the in-session semantic — may be acceptable for the opt-in path) or amend `agent-cmd/command-thin` to carve out an exception for opt-in in-session algorithms. Rule agent-cmd/command-thin (MUST)."
    },
    {
      "file": "commands/pr-review.md",
      "line": 214,
      "severity": "critical",
      "message": "[agent-cmd/command-thin] Same finding as code-review.md:184 — the canonical selector section duplicates six categories of business logic that `command-thin` says belong in an agent. pr-review.md is now 334 lines (was 267); the selector section accounts for ~20%. The spec's Non-goals explicitly preserve this in this migration step; tracked for the default-flip cleanup. Rule agent-cmd/command-thin (MUST)."
    },
    {
      "file": "commands/code-review.md",
      "line": 185,
      "severity": "major",
      "message": "[agent-cmd/single-source-of-truth] The selector section is duplicated across the two command files (~60 lines each), maintained by comment instructions (\"edit there first, mirror here, diff to verify\"). The HTML comment is a workaround that does not actually enforce the invariant — drift would only be caught at PR review. A single-source solution (extracted agent, or a single include) would remove the maintenance burden. Rule agent-cmd/single-source-of-truth (SHOULD)."
    }
  ],
  "concerns_addressed": [
    "correctness: in-session classify + adjudicate path preserves dedup/confidence semantics — addressed structurally by the narrowing invariant (`applicable ⊆ candidates`); cited in spec lines 23, 49, and in commands/pr-review.md:228 / commands/code-review.md:198",
    "correctness: --selector option short-circuits spawns only when explicitly enabled — addressed: parser at commands/pr-review.md:83 and commands/code-review.md:23 recognizes the token; default path is byte-for-byte unchanged per the spec's 'MUST remain byte-for-byte as today' constraint",
    "correctness: new types/constants for selector mode are additive — no JSON/markdown report schema breakage (the selector path reuses the existing Must/Should/Nice buckets per the spec's severity vocabulary constraint)",
    "tests: regression coverage on the existing spawn path — addressed via spec AC3 (default-mode scenarios 002/004 unchanged, `make check-acceptance` exits 0)",
    "tests: flag-parsing test for --selector — NO TEST WAS ADDED. The spec mentions manual fixture walk (AC1) and `make precommit` (AC5) but the diff adds no automated test that asserts zero-spawn behavior on --selector. AC2 is verifiable from transcript alone, not from a test fixture. This is a gap relative to the concern.",
    "performance: zero-spawn path actually avoids subprocess/agent calls — addressed structurally: spec lines 18-19, commands/pr-review.md:218 and commands/code-review.md:187 both state 'zero sub-agent spawns', and the citation validation explicitly bypasses coding:simple-bash-runner in favor of a plain bash call (commands/pr-review.md:267-272, commands/code-review.md:238-243)"
  ]
}

@bborbe bborbe marked this pull request as draft June 11, 2026 18:01
@bborbe bborbe marked this pull request as ready for review June 11, 2026 18:53
@bborbe bborbe merged commit 6c4d9a3 into master Jun 11, 2026
1 check passed
@bborbe bborbe deleted the feature/selector-review branch June 11, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant