Skip to content

feat: test assessor enhancements for command documentation and organization (ADR A.2, A.3)#481

Merged
jwm4 merged 1 commit into
mainfrom
feat/460-test-assessor-enhancements
May 28, 2026
Merged

feat: test assessor enhancements for command documentation and organization (ADR A.2, A.3)#481
jwm4 merged 1 commit into
mainfrom
feat/460-test-assessor-enhancements

Conversation

@jwm4
Copy link
Copy Markdown
Contributor

@jwm4 jwm4 commented May 28, 2026

Summary

  • Add test command documentation check (10-point bonus) to TestExecutionAssessor across Python, JS/TS, and Go
  • Add test organization detection as substantiating evidence (not scoring)
  • Existing scores unchanged: documentation bonus is additive, capped at 100

A.3 (test command documentation): Scans CLAUDE.md, AGENTS.md, and README.md for language-specific test command keywords (e.g., pytest, npm test, go test). Awards 10 bonus points when found. The insight from the BP: agents need to find the command, not just have a runner config file present.

A.2 (test organization): Detects unit/integration test separation signals and records them as evidence. Python: separate dirs, pytest markers, Makefile targets. JS/TS: separate dirs, filtered test scripts (test:unit, test:integration). Go: build tags, Makefile targets. No score impact per ADR ("substantiating evidence, not a hard gate").

Implements Proposals A.2 and A.3 from the accepted ADR. Third of six implementation PRs.

Self-score change: 73.2 -> 74.8 Silver (documentation bonus applies since AGENTS.md mentions pytest).

Related issues

  1. Remove redundant assessors, realign tiers, rebalance weights (Remove redundant assessors, realign tiers, rebalance weights (ADR C.1-4, E.1-2) #458) - merged in refactor: remove redundant assessors, realign tiers, rebalance weights #464
  2. Context file assessor improvements (Context file assessor improvements (ADR A.1, A.9, A.4) #459) - merged in feat: context file assessor improvements (ADR A.1, A.9, A.4) #477
  3. This PR - Test assessor enhancements (Test assessor enhancements (ADR A.2, A.3) #460)
  4. Enforcement and intent assessor improvements (Enforcement and intent assessor improvements (ADR A.5, A.8) #461)
  5. Code quality assessor enhancements (Code quality assessor enhancements (ADR A.6, A.7) #462)
  6. New assessors for architectural boundaries and threat models (New assessors for architectural boundaries and threat models (ADR B.1, B.2) #463)

Test plan

  • black . && isort . && ruff check . passes
  • pytest tests/unit/ passes (1111 passed, 17 skipped)
  • agentready assess . runs successfully (74.8/100 Silver)
  • 11 new tests for documentation scoring and organization evidence
  • All 46 existing TestExecutionAssessor tests pass unchanged

Closes #460

Posted by Bill Murdock with assistance from Claude Code.

Summary by CodeRabbit

  • New Features

    • Test execution assessment now awards bonus points for documenting test commands in project documentation files.
    • Enhanced detection of test organization patterns (unit vs. integration testing separation) across supported programming languages.
  • Documentation

    • Updated test execution scoring criteria to reflect expanded assessment breakdown and new bonus opportunities.

…zation (ADR A.2, A.3)

Add test command documentation check (10-point bonus) that scans
CLAUDE.md/AGENTS.md/README for test command keywords. Agents need to
find the command, not just have a runner configured. Add test
organization detection as substantiating evidence (unit/integration
separation, pytest markers, Makefile targets, filtered test scripts).

Existing scores unchanged: documentation is a bonus on top of the
existing 40/20/20/20 signals, capped at 100.

Closes #460

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: dfb89ad6-77d1-4ec8-9d01-ca421361c278

📥 Commits

Reviewing files that changed from the base of the PR and between 51e57e7 and 51030ec.

📒 Files selected for processing (3)
  • docs/attributes.md
  • src/agentready/assessors/testing.py
  • tests/unit/test_assessors_testing.py

📝 Walkthrough

Walkthrough

The PR enhances the test assessor to detect and score documented test commands in context files and collect evidence of test organization patterns. It adds a 10-point bonus when test commands are found in CLAUDE.md, AGENTS.md, or README.md, introduces language-specific heuristics for unit/integration test separation, and updates documentation and tests accordingly.

Changes

Test Execution Assessor Enhancements

Layer / File(s) Summary
Documentation and shared helper infrastructure
docs/attributes.md, src/agentready/assessors/testing.py
Scoring rubric now documents the 10-point bonus for context-file-documented test commands and non-scoring test-organization evidence. Two new reusable helpers scan context files for test-command keywords and extract language-specific organization signals (directories, markers, build targets, package.json scripts).
Python assessor bonus and evidence integration
src/agentready/assessors/testing.py
Python assessor applies helpers to detect pytest/pytest tests/... commands in context files, awards bonus points, and collects evidence on unit/integration directory structure, pytest markers, and Makefile test targets.
JavaScript/TypeScript assessor bonus and evidence integration
src/agentready/assessors/testing.py
JavaScript/TypeScript assessor applies helpers to detect npm test and related commands in context files, awards bonus points, and collects evidence on unit/integration directories and package.json test scripts.
Go assessor bonus and evidence integration
src/agentready/assessors/testing.py
Go assessor applies helpers to detect go test commands in context files, awards bonus points, and collects evidence on integration build tags and Makefile test targets.
Test coverage for new assessor behavior
tests/unit/test_assessors_testing.py
11 new tests validate documented-command bonus detection (positive and negative cases), single-file references, and organization evidence collection across Python, JavaScript, and Go; assertions confirm bonus scoring, evidence presence, and lack of penalties when organization signals are absent.

Suggested labels

released

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title follows Conventional Commits format with type 'feat' and clear scope; describes the main changes accurately.
Linked Issues check ✅ Passed All coding requirements from #460 are met: A.2 detects unit/integration separation with language-specific heuristics and records evidence; A.3 implements test command documentation bonus (10 points) by scanning CLAUDE.md/AGENTS.md/README.md.
Out of Scope Changes check ✅ Passed All changes directly support ADR A.2/A.3 implementation: assessor logic, documentation, and new test cases are aligned with linked issue requirements.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/460-test-assessor-enhancements
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/460-test-assessor-enhancements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

📈 Test Coverage Report

Branch Coverage
This PR 73.2%
Main 73.1%
Diff ✅ +0.1%

Coverage calculated from unit tests only

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@jwm4
Copy link
Copy Markdown
Contributor Author

jwm4 commented May 28, 2026

Review: Test Assessor Enhancements (ADR A.2, A.3)

Tested the PR against four real repos (astronomer/dag-factory, ReactiveX/RxPY, go-graphite/go-carbon, and agentready itself) to validate both new features.

A.3 (Test command documentation bonus): Worked correctly across all four repos. Detection fired for both AGENTS.md (dag-factory, RxPY, agentready) and CLAUDE.md (go-carbon). The 10-point bonus is visible in scoring: e.g., RxPY scored 90 (80 base + 10 doc bonus).

A.2 (Test organization evidence): Correctly recorded as non-scoring evidence. Detected separate tests/unit/ and tests/integration/ directories (agentready) and @pytest.mark.integration markers (dag-factory). Repos without organization signals (RxPY, go-carbon) received no penalty, confirming "substantiating evidence, not a hard gate."

Minor observations (non-blocking):

  1. Duplicated Makefile target scanning in _check_test_organization for Python and Go (same targets, same regex, same error handling). Not a problem at current scale, just noting for future language additions.

  2. import json at function scope in the JS/TS branch of _check_test_organization (~line 283). It's also imported at function scope in _assess_javascript_coverage (line 226). Neither is wrong, but they're inconsistent with each other. Minor style nit.

  3. Pattern overlap in Go between Signal 2 and Signal 5. A repo with make test in its README could earn points from both signals for overlapping evidence. The 100-point cap prevents over-scoring, so this is correct behavior, just worth being aware of.

Posted by Bill Murdock with assistance from Claude Code.

@jwm4 jwm4 merged commit e12e075 into main May 28, 2026
6 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 2.45.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test assessor enhancements (ADR A.2, A.3)

1 participant