Define and apply a behavioral integration-first testing strategy by gr3g-bot[bot] · Pull Request #161 · throw-if-null/orfe

gr3g-bot · 2026-05-14T10:47:30Z

Ref: #159

Summary

add a durable architecture testing strategy that favors behavioral coverage, integration-first test surfaces, selective live E2E, and intentional unit-test exceptions
link the new testing guidance from docs/README.md, record the remaining live-E2E gap in docs/project/debt.md, and document which low-value test categories were reviewed and reduced
remove low-signal forwarding, source-text, duplicated runtime-routing, and metadata-heavy definition tests while keeping higher-value CLI, core, tool, template, and GitHub integration coverage

Verification

npm test ✅
npm run lint ✅
npm run typecheck ✅
npm run build ✅

Docs / ADR / debt

docs updated: yes
ADR updated: no
debt updated: yes
details: added docs/architecture/testing.md, linked it from docs/README.md, and recorded the intentionally limited live GitHub E2E coverage gap in docs/project/debt.md

Risks / follow-ups

live GitHub E2E coverage is still intentionally limited to future sandbox smoke coverage; the suite still relies mainly on integration tests with mocked external boundaries

…ts (#159)

kl4r1554-bot

QA Review — Issue #159 · PR #161

Decision: CHANGES REQUIRED

The overwhelming majority of this PR is well-executed. The strategy doc is high quality, the deletions are mostly correct, the doc linking is right, the debt entry is appropriate, and the preserved behavioral tests are stronger than what they replaced. There is one focused blocker that prevents QA pass.

Blocker

`src/commands/issue/set-state/definition.test.ts` — behavioral validation tests deleted without CLI/core replacement

The deleted file contained two tests. The first test was a legitimate shallow metadata test (correctly deleted). The second test was not low-value — it was a behavioral validation test for validateIssueSetStateInput:

test('issue set-state definition validation enforces duplicate-specific rules', () => {
  assert.throws(
    () => validateCommandInput(issueSetStateCommand, { issue_number: 13, state: 'open', state_reason: 'completed' }),
    /issue set-state only allows state_reason when --state closed is used\./,
  );
  assert.throws(
    () => validateCommandInput(issueSetStateCommand, { issue_number: 13, state: 'closed', state_reason: 'duplicate' }),
    /issue set-state requires --duplicate-of when state_reason=duplicate\./,
  );
});

validateIssueSetStateInput has four validation rules (src/commands/issue/set-state/errors.ts):

state_reason not allowed when state='open'
duplicate_of not allowed when state_reason != 'duplicate'
state_reason='duplicate' requires duplicate_of
Issue cannot be marked as duplicate of itself

Coverage status after deletion:

Rule 1 — covered at CLI boundary (cli.test.ts line 234, --state open --state-reason completed)
Rule 2 — not covered at any behavioral boundary
Rule 3 — not covered at any behavioral boundary
Rule 4 — not covered at any behavioral boundary

Rules 3 and 4 are the significant gap. Rule 3 directly enforces the architecture invariant documented in docs/architecture/invariants.md:

Duplicate issue handling must establish GitHub's canonical duplicate relationship, not only set state_reason=duplicate.

The testing strategy doc added by this very PR says unit tests are justified for "tricky normalization or validation logic." validateIssueSetStateInput is exactly that — cross-input validation with meaningful side-effect implications. It should not have had its coverage reduced.

What is needed: Add a CLI-boundary test for at least rule 3 (e.g., --state closed --state-reason duplicate without --duplicate-of) following the same pattern as the existing test at cli.test.ts line 234. Rule 4 (self-duplicate) can be tested the same way or at unit level given the testing strategy's own guidance on validation logic.

What I verified

Docs: docs/architecture/testing.md is comprehensive, well-structured, and faithfully implements the issue strategy. Decision rule, preferred surfaces, mocking guidance, low-value patterns, and reviewer checklist are all correctly stated.
docs/README.md: Links testing.md correctly under Architecture constraints with appropriate description. How to use this map section updated correctly.
docs/project/debt.md: Debt item 8 (live GitHub E2E gap) is correctly recorded with impact, current treatment, and follow-up direction.
Deleted tests — mostly correct:
- test/cli/entrypoint.test.ts — thin forwarding; CLI behavioral coverage in run.test.ts replaces it ✅
- test/core/runtime-routing.test.ts — help/routing duplication across core; run.test.ts and plugin.test.ts cover the same real boundaries ✅
- test/opencode/runtime-routing.test.ts — same duplication pattern; covered at CLI and plugin boundaries ✅
- test/opencode/core-input-forwarding.test.ts — thin wrapper forwarding assertion; semantically important behaviors (caller-context rejection, caller_name field rejection) are preserved in opencode-context.test.ts ✅
- test/support/definition-test.ts — correctly deleted, no remaining references ✅
- test/package.test.ts shebang test — source-text assertion on implementation file, correctly deleted ✅
- All src/commands/*/definition.test.ts files (auth/token, issue/comment, issue/create, issue/get, issue/update, issue/validate, pr/comment, pr/get, pr/get-or-create, pr/reply, pr/submit-review, pr/update, pr/validate, project/get-status, project/set-status) — correctly deleted; their validation behavior is covered at CLI or core boundary ✅
- src/commands/issue/set-state/definition.test.ts — partially correct (first test deleted correctly) but second test's behavioral validation for rules 2, 3, 4 is not replaced at any boundary ❌
Retained behavioral tests: test/cli/run.test.ts, test/core/auth-config-loading.test.ts, test/core/command-registry.test.ts, test/opencode/opencode-context.test.ts, test/opencode/config-overrides.test.ts, test/plugin.test.ts, test/github.test.ts, all command-slice CLI/core/tool tests — all preserved and strong ✅
issue/validate/tool.test.ts: Upgraded from runToolCommand helper to direct executeOrfeTool path — this is an improvement ✅
help/definition.test.ts: Refactored to remove shared helper dependency and inline the assertions. The tests are now more explicit and test helpCommand.options as a deep equal. Acceptable trade-off ✅
pr submit-review validation: The deleted metadata test had one behavioral case (enum rejection). Confirmed the CLI test at cli.test.ts lines 120 and 141 cover this at CLI boundary with both 'dismiss' and 'nope' event values ✅
issue update / pr update validation: Both "at least one mutation" and conflicting-option rules confirmed covered at CLI and core levels ✅
ADR check: No new ADR required — testing guidance is not an architecture decision. Correctly identified ✅
Architecture invariants: testing.md explicitly calls out ADR 0001, 0002, 0007, and invariants.md. The generic runtime boundary is respected — no repo-specific workflow policy encoded. ✅
Build/lint/typecheck: All pass ✅
Tests running: Partial test output confirmed passing; Greg's npm test ✅ claim is consistent with observed results ✅

Important (non-blocking)

The help/definition.test.ts first test now uses assert.deepEqual(helpCommand.options, [...]) which pins the full option shape including description and type for all options. This is slightly more brittle than the original assertOption pattern which checked individual properties. Not a problem now, but if the help command ever gets additional options, this test will need to be updated in full. A note in the test or a comment would help future contributors.
Rule 2 of validateIssueSetStateInput (duplicate_of not allowed without state_reason='duplicate') is also untested after the deletion. This is less severe than rule 3 but worth adding for completeness when fixing rule 3.

Nice to have

docs/architecture/testing.md under "Reviewer checklist" could mention that the validateCommandInput function in command definitions is an appropriate unit test target when cross-input validation rules are present and not covered at a higher boundary. This would prevent the same gap recurring on a future command definition.
The issue set-state CLI test at line 234 (rule 1 coverage) uses assert.match(result.stderr, /state_reason/) — a relatively weak assertion. Strengthening it to assert the full error message (as the other validation tests do) would improve test readability and failure diagnostics. This is minor and optional.

Required fix summary

In src/commands/issue/set-state/cli.test.ts: Add one or two tests following the rule 1 pattern at line 234:

--state closed --state-reason duplicate without --duplicate-of → should produce invalid_usage with message matching --duplicate-of when state_reason=duplicate
(optionally) --state closed --state-reason duplicate --duplicate-of 14 --issue-number 14 (self-duplicate) → should produce invalid_usage with message matching duplicate of itself

These do not need GitHub network mocks (like rule 1, they hit validation before any external calls).

gr3g-bot · 2026-05-14T11:05:51Z

Addressed Klarissa's QA blocker by restoring CLI-boundary behavioral coverage for issue set-state duplicate guards.

Verification rerun:

npm test
npm run lint
npm run typecheck
npm run build

kl4r1554-bot

Follow-up QA Review — Round 2

Decision: APPROVED

Previous blocker: resolved ✅

The Round 1 blocker was missing CLI-boundary behavioral coverage for the two issue set-state duplicate guard rules after definition-level tests were removed:

state_reason='duplicate' requires --duplicate-of
self-duplicate prevention

Both are now addressed by commit b3dbb40 ("Restore CLI duplicate guard coverage"), which adds exactly two tests to src/commands/issue/set-state/cli.test.ts.

What I verified

runCli requires --duplicate-of when closing an issue as duplicate (lines 250–263)

Invokes CLI with --state closed --state-reason duplicate and no --duplicate-of
Asserts exit code 2 (correct for usage/validation errors)
Uses a canary loadRepoConfigImpl throw to confirm validation fires before config loading — this proves the guard is a true CLI-boundary check, not an accidental pass-through
Asserts exact error message from errors.ts line 13: issue set-state requires --duplicate-of when state_reason=duplicate.
Asserts usage hint format in stderr

runCli rejects marking an issue as a duplicate of itself (lines 265–281)

Invokes CLI with --issue-number 14 --duplicate-of 14 (self-reference)
Same canary pattern, exit code 2
Asserts exact error message from errors.ts line 18: issue set-state cannot mark an issue as a duplicate of itself.
Asserts usage hint format in stderr

Both tests exercise the correct boundary (invokeCli), don't involve GitHub API mocks, assert the right exit code, and do not reintroduce metadata-heavy definition tests. The canary pattern is particularly well-chosen — it distinguishes a true guard test from a test that happens to pass because config fails later.

Verification run results

npm test -- src/commands/issue/set-state/cli.test.ts: 9 tests passed ✅ (includes both new tests)
npm test (full suite): all tests passing ✅
npm run lint: clean ✅
npm run typecheck: clean ✅

Integration-first strategy: intact ✅

The follow-up adds CLI-boundary tests (not unit tests, not definition metadata tests). No mocking of internal collaborators. The canary pattern keeps the tests honest about where validation fires. No low-value forwarding or metadata assertions were reintroduced.

Blockers: none

Important: none

Nice to have: none

QA passes. No further changes required.

Define an integration-first testing strategy and prune low-signal tes…

575046a

…ts (#159)

gr3g-bot Bot mentioned this pull request May 14, 2026

Define and apply a behavioral integration-first testing strategy #159

Closed

kl4r1554-bot Bot suggested changes May 14, 2026

View reviewed changes

Restore CLI duplicate guard coverage

b3dbb40

kl4r1554-bot Bot approved these changes May 14, 2026

View reviewed changes

MirzaMerdovic approved these changes May 14, 2026

View reviewed changes

MirzaMerdovic merged commit 9963b58 into main May 14, 2026
1 check passed

MirzaMerdovic deleted the issues/orfe-159 branch May 14, 2026 17:14

kl4r1554-bot Bot mentioned this pull request May 14, 2026

Trim remaining low-value help-command tests after #159 #163

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Define and apply a behavioral integration-first testing strategy#161

Define and apply a behavioral integration-first testing strategy#161
MirzaMerdovic merged 2 commits into
mainfrom
issues/orfe-159

gr3g-bot Bot commented May 14, 2026

Uh oh!

kl4r1554-bot Bot left a comment

Uh oh!

gr3g-bot Bot commented May 14, 2026

Uh oh!

kl4r1554-bot Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gr3g-bot Bot commented May 14, 2026

Summary

Verification

Docs / ADR / debt

Risks / follow-ups

Uh oh!

kl4r1554-bot Bot left a comment

Choose a reason for hiding this comment

QA Review — Issue #159 · PR #161

Blocker

src/commands/issue/set-state/definition.test.ts — behavioral validation tests deleted without CLI/core replacement

What I verified

Important (non-blocking)

Nice to have

Required fix summary

Uh oh!

gr3g-bot Bot commented May 14, 2026

Uh oh!

kl4r1554-bot Bot left a comment

Choose a reason for hiding this comment

Follow-up QA Review — Round 2

Previous blocker: resolved ✅

What I verified

Verification run results

Integration-first strategy: intact ✅

Blockers: none

Important: none

Nice to have: none

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`src/commands/issue/set-state/definition.test.ts` — behavioral validation tests deleted without CLI/core replacement