Skip to content

Define and apply a behavioral integration-first testing strategy#161

Merged
MirzaMerdovic merged 2 commits into
mainfrom
issues/orfe-159
May 14, 2026
Merged

Define and apply a behavioral integration-first testing strategy#161
MirzaMerdovic merged 2 commits into
mainfrom
issues/orfe-159

Conversation

@gr3g-bot

@gr3g-bot gr3g-bot Bot commented May 14, 2026

Copy link
Copy Markdown

Ref: #159

Summary

  • add a durable architecture testing strategy that favors behavioral coverage, integration-first test surfaces, selective live E2E, and intentional unit-test exceptions
  • link the new testing guidance from docs/README.md, record the remaining live-E2E gap in docs/project/debt.md, and document which low-value test categories were reviewed and reduced
  • remove low-signal forwarding, source-text, duplicated runtime-routing, and metadata-heavy definition tests while keeping higher-value CLI, core, tool, template, and GitHub integration coverage

Verification

  • npm test
  • npm run lint
  • npm run typecheck
  • npm run build

Docs / ADR / debt

  • docs updated: yes
  • ADR updated: no
  • debt updated: yes
  • details: added docs/architecture/testing.md, linked it from docs/README.md, and recorded the intentionally limited live GitHub E2E coverage gap in docs/project/debt.md

Risks / follow-ups

  • live GitHub E2E coverage is still intentionally limited to future sandbox smoke coverage; the suite still relies mainly on integration tests with mocked external boundaries

@kl4r1554-bot kl4r1554-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Review — Issue #159 · PR #161

Decision: CHANGES REQUIRED

The overwhelming majority of this PR is well-executed. The strategy doc is high quality, the deletions are mostly correct, the doc linking is right, the debt entry is appropriate, and the preserved behavioral tests are stronger than what they replaced. There is one focused blocker that prevents QA pass.


Blocker

src/commands/issue/set-state/definition.test.ts — behavioral validation tests deleted without CLI/core replacement

The deleted file contained two tests. The first test was a legitimate shallow metadata test (correctly deleted). The second test was not low-value — it was a behavioral validation test for validateIssueSetStateInput:

test('issue set-state definition validation enforces duplicate-specific rules', () => {
  assert.throws(
    () => validateCommandInput(issueSetStateCommand, { issue_number: 13, state: 'open', state_reason: 'completed' }),
    /issue set-state only allows state_reason when --state closed is used\./,
  );
  assert.throws(
    () => validateCommandInput(issueSetStateCommand, { issue_number: 13, state: 'closed', state_reason: 'duplicate' }),
    /issue set-state requires --duplicate-of when state_reason=duplicate\./,
  );
});

validateIssueSetStateInput has four validation rules (src/commands/issue/set-state/errors.ts):

  1. state_reason not allowed when state='open'
  2. duplicate_of not allowed when state_reason != 'duplicate'
  3. state_reason='duplicate' requires duplicate_of
  4. Issue cannot be marked as duplicate of itself

Coverage status after deletion:

  • Rule 1 — covered at CLI boundary (cli.test.ts line 234, --state open --state-reason completed)
  • Rule 2 — not covered at any behavioral boundary
  • Rule 3 — not covered at any behavioral boundary
  • Rule 4 — not covered at any behavioral boundary

Rules 3 and 4 are the significant gap. Rule 3 directly enforces the architecture invariant documented in docs/architecture/invariants.md:

Duplicate issue handling must establish GitHub's canonical duplicate relationship, not only set state_reason=duplicate.

The testing strategy doc added by this very PR says unit tests are justified for "tricky normalization or validation logic." validateIssueSetStateInput is exactly that — cross-input validation with meaningful side-effect implications. It should not have had its coverage reduced.

What is needed: Add a CLI-boundary test for at least rule 3 (e.g., --state closed --state-reason duplicate without --duplicate-of) following the same pattern as the existing test at cli.test.ts line 234. Rule 4 (self-duplicate) can be tested the same way or at unit level given the testing strategy's own guidance on validation logic.


What I verified

  • Docs: docs/architecture/testing.md is comprehensive, well-structured, and faithfully implements the issue strategy. Decision rule, preferred surfaces, mocking guidance, low-value patterns, and reviewer checklist are all correctly stated.
  • docs/README.md: Links testing.md correctly under Architecture constraints with appropriate description. How to use this map section updated correctly.
  • docs/project/debt.md: Debt item 8 (live GitHub E2E gap) is correctly recorded with impact, current treatment, and follow-up direction.
  • Deleted tests — mostly correct:
    • test/cli/entrypoint.test.ts — thin forwarding; CLI behavioral coverage in run.test.ts replaces it ✅
    • test/core/runtime-routing.test.ts — help/routing duplication across core; run.test.ts and plugin.test.ts cover the same real boundaries ✅
    • test/opencode/runtime-routing.test.ts — same duplication pattern; covered at CLI and plugin boundaries ✅
    • test/opencode/core-input-forwarding.test.ts — thin wrapper forwarding assertion; semantically important behaviors (caller-context rejection, caller_name field rejection) are preserved in opencode-context.test.ts
    • test/support/definition-test.ts — correctly deleted, no remaining references ✅
    • test/package.test.ts shebang test — source-text assertion on implementation file, correctly deleted ✅
    • All src/commands/*/definition.test.ts files (auth/token, issue/comment, issue/create, issue/get, issue/update, issue/validate, pr/comment, pr/get, pr/get-or-create, pr/reply, pr/submit-review, pr/update, pr/validate, project/get-status, project/set-status) — correctly deleted; their validation behavior is covered at CLI or core boundary ✅
    • src/commands/issue/set-state/definition.test.ts — partially correct (first test deleted correctly) but second test's behavioral validation for rules 2, 3, 4 is not replaced at any boundary ❌
  • Retained behavioral tests: test/cli/run.test.ts, test/core/auth-config-loading.test.ts, test/core/command-registry.test.ts, test/opencode/opencode-context.test.ts, test/opencode/config-overrides.test.ts, test/plugin.test.ts, test/github.test.ts, all command-slice CLI/core/tool tests — all preserved and strong ✅
  • issue/validate/tool.test.ts: Upgraded from runToolCommand helper to direct executeOrfeTool path — this is an improvement ✅
  • help/definition.test.ts: Refactored to remove shared helper dependency and inline the assertions. The tests are now more explicit and test helpCommand.options as a deep equal. Acceptable trade-off ✅
  • pr submit-review validation: The deleted metadata test had one behavioral case (enum rejection). Confirmed the CLI test at cli.test.ts lines 120 and 141 cover this at CLI boundary with both 'dismiss' and 'nope' event values ✅
  • issue update / pr update validation: Both "at least one mutation" and conflicting-option rules confirmed covered at CLI and core levels ✅
  • ADR check: No new ADR required — testing guidance is not an architecture decision. Correctly identified ✅
  • Architecture invariants: testing.md explicitly calls out ADR 0001, 0002, 0007, and invariants.md. The generic runtime boundary is respected — no repo-specific workflow policy encoded. ✅
  • Build/lint/typecheck: All pass ✅
  • Tests running: Partial test output confirmed passing; Greg's npm test ✅ claim is consistent with observed results ✅

Important (non-blocking)

  • The help/definition.test.ts first test now uses assert.deepEqual(helpCommand.options, [...]) which pins the full option shape including description and type for all options. This is slightly more brittle than the original assertOption pattern which checked individual properties. Not a problem now, but if the help command ever gets additional options, this test will need to be updated in full. A note in the test or a comment would help future contributors.

  • Rule 2 of validateIssueSetStateInput (duplicate_of not allowed without state_reason='duplicate') is also untested after the deletion. This is less severe than rule 3 but worth adding for completeness when fixing rule 3.


Nice to have

  • docs/architecture/testing.md under "Reviewer checklist" could mention that the validateCommandInput function in command definitions is an appropriate unit test target when cross-input validation rules are present and not covered at a higher boundary. This would prevent the same gap recurring on a future command definition.
  • The issue set-state CLI test at line 234 (rule 1 coverage) uses assert.match(result.stderr, /state_reason/) — a relatively weak assertion. Strengthening it to assert the full error message (as the other validation tests do) would improve test readability and failure diagnostics. This is minor and optional.

Required fix summary

In src/commands/issue/set-state/cli.test.ts: Add one or two tests following the rule 1 pattern at line 234:

  • --state closed --state-reason duplicate without --duplicate-of → should produce invalid_usage with message matching --duplicate-of when state_reason=duplicate
  • (optionally) --state closed --state-reason duplicate --duplicate-of 14 --issue-number 14 (self-duplicate) → should produce invalid_usage with message matching duplicate of itself

These do not need GitHub network mocks (like rule 1, they hit validation before any external calls).

@gr3g-bot

gr3g-bot Bot commented May 14, 2026

Copy link
Copy Markdown
Author

Addressed Klarissa's QA blocker by restoring CLI-boundary behavioral coverage for issue set-state duplicate guards.

Verification rerun:

  • npm test
  • npm run lint
  • npm run typecheck
  • npm run build

@kl4r1554-bot kl4r1554-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up QA Review — Round 2

Decision: APPROVED


Previous blocker: resolved ✅

The Round 1 blocker was missing CLI-boundary behavioral coverage for the two issue set-state duplicate guard rules after definition-level tests were removed:

  1. state_reason='duplicate' requires --duplicate-of
  2. self-duplicate prevention

Both are now addressed by commit b3dbb40 ("Restore CLI duplicate guard coverage"), which adds exactly two tests to src/commands/issue/set-state/cli.test.ts.


What I verified

runCli requires --duplicate-of when closing an issue as duplicate (lines 250–263)

  • Invokes CLI with --state closed --state-reason duplicate and no --duplicate-of
  • Asserts exit code 2 (correct for usage/validation errors)
  • Uses a canary loadRepoConfigImpl throw to confirm validation fires before config loading — this proves the guard is a true CLI-boundary check, not an accidental pass-through
  • Asserts exact error message from errors.ts line 13: issue set-state requires --duplicate-of when state_reason=duplicate.
  • Asserts usage hint format in stderr

runCli rejects marking an issue as a duplicate of itself (lines 265–281)

  • Invokes CLI with --issue-number 14 --duplicate-of 14 (self-reference)
  • Same canary pattern, exit code 2
  • Asserts exact error message from errors.ts line 18: issue set-state cannot mark an issue as a duplicate of itself.
  • Asserts usage hint format in stderr

Both tests exercise the correct boundary (invokeCli), don't involve GitHub API mocks, assert the right exit code, and do not reintroduce metadata-heavy definition tests. The canary pattern is particularly well-chosen — it distinguishes a true guard test from a test that happens to pass because config fails later.


Verification run results

  • npm test -- src/commands/issue/set-state/cli.test.ts: 9 tests passed ✅ (includes both new tests)
  • npm test (full suite): all tests passing ✅
  • npm run lint: clean ✅
  • npm run typecheck: clean ✅

Integration-first strategy: intact ✅

The follow-up adds CLI-boundary tests (not unit tests, not definition metadata tests). No mocking of internal collaborators. The canary pattern keeps the tests honest about where validation fires. No low-value forwarding or metadata assertions were reintroduced.


Blockers: none

Important: none

Nice to have: none

QA passes. No further changes required.

@MirzaMerdovic MirzaMerdovic merged commit 9963b58 into main May 14, 2026
1 check passed
@MirzaMerdovic MirzaMerdovic deleted the issues/orfe-159 branch May 14, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant