Define and apply a behavioral integration-first testing strategy#161
Conversation
There was a problem hiding this comment.
QA Review — Issue #159 · PR #161
Decision: CHANGES REQUIRED
The overwhelming majority of this PR is well-executed. The strategy doc is high quality, the deletions are mostly correct, the doc linking is right, the debt entry is appropriate, and the preserved behavioral tests are stronger than what they replaced. There is one focused blocker that prevents QA pass.
Blocker
src/commands/issue/set-state/definition.test.ts — behavioral validation tests deleted without CLI/core replacement
The deleted file contained two tests. The first test was a legitimate shallow metadata test (correctly deleted). The second test was not low-value — it was a behavioral validation test for validateIssueSetStateInput:
test('issue set-state definition validation enforces duplicate-specific rules', () => {
assert.throws(
() => validateCommandInput(issueSetStateCommand, { issue_number: 13, state: 'open', state_reason: 'completed' }),
/issue set-state only allows state_reason when --state closed is used\./,
);
assert.throws(
() => validateCommandInput(issueSetStateCommand, { issue_number: 13, state: 'closed', state_reason: 'duplicate' }),
/issue set-state requires --duplicate-of when state_reason=duplicate\./,
);
});validateIssueSetStateInput has four validation rules (src/commands/issue/set-state/errors.ts):
state_reasonnot allowed whenstate='open'duplicate_ofnot allowed whenstate_reason != 'duplicate'state_reason='duplicate'requiresduplicate_of- Issue cannot be marked as duplicate of itself
Coverage status after deletion:
- Rule 1 — covered at CLI boundary (
cli.test.tsline 234,--state open --state-reason completed) - Rule 2 — not covered at any behavioral boundary
- Rule 3 — not covered at any behavioral boundary
- Rule 4 — not covered at any behavioral boundary
Rules 3 and 4 are the significant gap. Rule 3 directly enforces the architecture invariant documented in docs/architecture/invariants.md:
Duplicate issue handling must establish GitHub's canonical duplicate relationship, not only set
state_reason=duplicate.
The testing strategy doc added by this very PR says unit tests are justified for "tricky normalization or validation logic." validateIssueSetStateInput is exactly that — cross-input validation with meaningful side-effect implications. It should not have had its coverage reduced.
What is needed: Add a CLI-boundary test for at least rule 3 (e.g., --state closed --state-reason duplicate without --duplicate-of) following the same pattern as the existing test at cli.test.ts line 234. Rule 4 (self-duplicate) can be tested the same way or at unit level given the testing strategy's own guidance on validation logic.
What I verified
- Docs:
docs/architecture/testing.mdis comprehensive, well-structured, and faithfully implements the issue strategy. Decision rule, preferred surfaces, mocking guidance, low-value patterns, and reviewer checklist are all correctly stated. docs/README.md: Linkstesting.mdcorrectly under Architecture constraints with appropriate description.How to use this mapsection updated correctly.docs/project/debt.md: Debt item 8 (live GitHub E2E gap) is correctly recorded with impact, current treatment, and follow-up direction.- Deleted tests — mostly correct:
test/cli/entrypoint.test.ts— thin forwarding; CLI behavioral coverage inrun.test.tsreplaces it ✅test/core/runtime-routing.test.ts— help/routing duplication across core;run.test.tsandplugin.test.tscover the same real boundaries ✅test/opencode/runtime-routing.test.ts— same duplication pattern; covered at CLI and plugin boundaries ✅test/opencode/core-input-forwarding.test.ts— thin wrapper forwarding assertion; semantically important behaviors (caller-context rejection,caller_namefield rejection) are preserved inopencode-context.test.ts✅test/support/definition-test.ts— correctly deleted, no remaining references ✅test/package.test.tsshebang test — source-text assertion on implementation file, correctly deleted ✅- All
src/commands/*/definition.test.tsfiles (auth/token, issue/comment, issue/create, issue/get, issue/update, issue/validate, pr/comment, pr/get, pr/get-or-create, pr/reply, pr/submit-review, pr/update, pr/validate, project/get-status, project/set-status) — correctly deleted; their validation behavior is covered at CLI or core boundary ✅ src/commands/issue/set-state/definition.test.ts— partially correct (first test deleted correctly) but second test's behavioral validation for rules 2, 3, 4 is not replaced at any boundary ❌
- Retained behavioral tests:
test/cli/run.test.ts,test/core/auth-config-loading.test.ts,test/core/command-registry.test.ts,test/opencode/opencode-context.test.ts,test/opencode/config-overrides.test.ts,test/plugin.test.ts,test/github.test.ts, all command-slice CLI/core/tool tests — all preserved and strong ✅ issue/validate/tool.test.ts: Upgraded fromrunToolCommandhelper to directexecuteOrfeToolpath — this is an improvement ✅help/definition.test.ts: Refactored to remove shared helper dependency and inline the assertions. The tests are now more explicit and testhelpCommand.optionsas a deep equal. Acceptable trade-off ✅pr submit-reviewvalidation: The deleted metadata test had one behavioral case (enum rejection). Confirmed the CLI test atcli.test.tslines 120 and 141 cover this at CLI boundary with both'dismiss'and'nope'event values ✅issue update/pr updatevalidation: Both "at least one mutation" and conflicting-option rules confirmed covered at CLI and core levels ✅- ADR check: No new ADR required — testing guidance is not an architecture decision. Correctly identified ✅
- Architecture invariants:
testing.mdexplicitly calls out ADR 0001, 0002, 0007, andinvariants.md. The generic runtime boundary is respected — no repo-specific workflow policy encoded. ✅ - Build/lint/typecheck: All pass ✅
- Tests running: Partial test output confirmed passing; Greg's
npm test ✅claim is consistent with observed results ✅
Important (non-blocking)
-
The
help/definition.test.tsfirst test now usesassert.deepEqual(helpCommand.options, [...])which pins the full option shape includingdescriptionandtypefor all options. This is slightly more brittle than the originalassertOptionpattern which checked individual properties. Not a problem now, but if the help command ever gets additional options, this test will need to be updated in full. A note in the test or a comment would help future contributors. -
Rule 2 of
validateIssueSetStateInput(duplicate_ofnot allowed withoutstate_reason='duplicate') is also untested after the deletion. This is less severe than rule 3 but worth adding for completeness when fixing rule 3.
Nice to have
docs/architecture/testing.mdunder "Reviewer checklist" could mention that thevalidateCommandInputfunction in command definitions is an appropriate unit test target when cross-input validation rules are present and not covered at a higher boundary. This would prevent the same gap recurring on a future command definition.- The issue set-state CLI test at line 234 (rule 1 coverage) uses
assert.match(result.stderr, /state_reason/)— a relatively weak assertion. Strengthening it to assert the full error message (as the other validation tests do) would improve test readability and failure diagnostics. This is minor and optional.
Required fix summary
In src/commands/issue/set-state/cli.test.ts: Add one or two tests following the rule 1 pattern at line 234:
--state closed --state-reason duplicatewithout--duplicate-of→ should produceinvalid_usagewith message matching--duplicate-of when state_reason=duplicate- (optionally)
--state closed --state-reason duplicate --duplicate-of 14 --issue-number 14(self-duplicate) → should produceinvalid_usagewith message matchingduplicate of itself
These do not need GitHub network mocks (like rule 1, they hit validation before any external calls).
|
Addressed Klarissa's QA blocker by restoring CLI-boundary behavioral coverage for Verification rerun:
|
There was a problem hiding this comment.
Follow-up QA Review — Round 2
Decision: APPROVED
Previous blocker: resolved ✅
The Round 1 blocker was missing CLI-boundary behavioral coverage for the two issue set-state duplicate guard rules after definition-level tests were removed:
state_reason='duplicate'requires--duplicate-of- self-duplicate prevention
Both are now addressed by commit b3dbb40 ("Restore CLI duplicate guard coverage"), which adds exactly two tests to src/commands/issue/set-state/cli.test.ts.
What I verified
runCli requires --duplicate-of when closing an issue as duplicate (lines 250–263)
- Invokes CLI with
--state closed --state-reason duplicateand no--duplicate-of - Asserts exit code 2 (correct for usage/validation errors)
- Uses a canary
loadRepoConfigImplthrow to confirm validation fires before config loading — this proves the guard is a true CLI-boundary check, not an accidental pass-through - Asserts exact error message from
errors.tsline 13:issue set-state requires --duplicate-of when state_reason=duplicate. - Asserts usage hint format in stderr
runCli rejects marking an issue as a duplicate of itself (lines 265–281)
- Invokes CLI with
--issue-number 14 --duplicate-of 14(self-reference) - Same canary pattern, exit code 2
- Asserts exact error message from
errors.tsline 18:issue set-state cannot mark an issue as a duplicate of itself. - Asserts usage hint format in stderr
Both tests exercise the correct boundary (invokeCli), don't involve GitHub API mocks, assert the right exit code, and do not reintroduce metadata-heavy definition tests. The canary pattern is particularly well-chosen — it distinguishes a true guard test from a test that happens to pass because config fails later.
Verification run results
npm test -- src/commands/issue/set-state/cli.test.ts: 9 tests passed ✅ (includes both new tests)npm test(full suite): all tests passing ✅npm run lint: clean ✅npm run typecheck: clean ✅
Integration-first strategy: intact ✅
The follow-up adds CLI-boundary tests (not unit tests, not definition metadata tests). No mocking of internal collaborators. The canary pattern keeps the tests honest about where validation fires. No low-value forwarding or metadata assertions were reintroduced.
Blockers: none
Important: none
Nice to have: none
QA passes. No further changes required.
Ref: #159
Summary
docs/README.md, record the remaining live-E2E gap indocs/project/debt.md, and document which low-value test categories were reviewed and reducedVerification
npm test✅npm run lint✅npm run typecheck✅npm run build✅Docs / ADR / debt
docs/architecture/testing.md, linked it fromdocs/README.md, and recorded the intentionally limited live GitHub E2E coverage gap indocs/project/debt.mdRisks / follow-ups