feat: test assessor enhancements for command documentation and organization (ADR A.2, A.3) by jwm4 · Pull Request #481 · ambient-code/agentready

jwm4 · 2026-05-28T20:05:37Z

Summary

Add test command documentation check (10-point bonus) to TestExecutionAssessor across Python, JS/TS, and Go
Add test organization detection as substantiating evidence (not scoring)
Existing scores unchanged: documentation bonus is additive, capped at 100

A.3 (test command documentation): Scans CLAUDE.md, AGENTS.md, and README.md for language-specific test command keywords (e.g., pytest, npm test, go test). Awards 10 bonus points when found. The insight from the BP: agents need to find the command, not just have a runner config file present.

A.2 (test organization): Detects unit/integration test separation signals and records them as evidence. Python: separate dirs, pytest markers, Makefile targets. JS/TS: separate dirs, filtered test scripts (test:unit, test:integration). Go: build tags, Makefile targets. No score impact per ADR ("substantiating evidence, not a hard gate").

Implements Proposals A.2 and A.3 from the accepted ADR. Third of six implementation PRs.

Self-score change: 73.2 -> 74.8 Silver (documentation bonus applies since AGENTS.md mentions pytest).

Related issues

Remove redundant assessors, realign tiers, rebalance weights (Remove redundant assessors, realign tiers, rebalance weights (ADR C.1-4, E.1-2) #458) - merged in refactor: remove redundant assessors, realign tiers, rebalance weights #464
Context file assessor improvements (Context file assessor improvements (ADR A.1, A.9, A.4) #459) - merged in feat: context file assessor improvements (ADR A.1, A.9, A.4) #477
This PR - Test assessor enhancements (Test assessor enhancements (ADR A.2, A.3) #460)
Enforcement and intent assessor improvements (Enforcement and intent assessor improvements (ADR A.5, A.8) #461)
Code quality assessor enhancements (Code quality assessor enhancements (ADR A.6, A.7) #462)
New assessors for architectural boundaries and threat models (New assessors for architectural boundaries and threat models (ADR B.1, B.2) #463)

Test plan

black . && isort . && ruff check . passes
pytest tests/unit/ passes (1111 passed, 17 skipped)
agentready assess . runs successfully (74.8/100 Silver)
11 new tests for documentation scoring and organization evidence
All 46 existing TestExecutionAssessor tests pass unchanged

Closes #460

Posted by Bill Murdock with assistance from Claude Code.

Summary by CodeRabbit

New Features
- Test execution assessment now awards bonus points for documenting test commands in project documentation files.
- Enhanced detection of test organization patterns (unit vs. integration testing separation) across supported programming languages.
Documentation
- Updated test execution scoring criteria to reflect expanded assessment breakdown and new bonus opportunities.

…zation (ADR A.2, A.3) Add test command documentation check (10-point bonus) that scans CLAUDE.md/AGENTS.md/README for test command keywords. Agents need to find the command, not just have a runner configured. Add test organization detection as substantiating evidence (unit/integration separation, pytest markers, Makefile targets, filtered test scripts). Existing scores unchanged: documentation is a bonus on top of the existing 40/20/20/20 signals, capped at 100. Closes #460 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-05-28T20:05:49Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: dfb89ad6-77d1-4ec8-9d01-ca421361c278

📥 Commits

Reviewing files that changed from the base of the PR and between 51e57e7 and 51030ec.

📒 Files selected for processing (3)

docs/attributes.md
src/agentready/assessors/testing.py
tests/unit/test_assessors_testing.py

📝 Walkthrough

Walkthrough

The PR enhances the test assessor to detect and score documented test commands in context files and collect evidence of test organization patterns. It adds a 10-point bonus when test commands are found in CLAUDE.md, AGENTS.md, or README.md, introduces language-specific heuristics for unit/integration test separation, and updates documentation and tests accordingly.

Changes

Test Execution Assessor Enhancements

Layer / File(s)	Summary
Documentation and shared helper infrastructure `docs/attributes.md`, `src/agentready/assessors/testing.py`	Scoring rubric now documents the 10-point bonus for context-file-documented test commands and non-scoring test-organization evidence. Two new reusable helpers scan context files for test-command keywords and extract language-specific organization signals (directories, markers, build targets, package.json scripts).
Python assessor bonus and evidence integration `src/agentready/assessors/testing.py`	Python assessor applies helpers to detect pytest/`pytest tests/...` commands in context files, awards bonus points, and collects evidence on unit/integration directory structure, pytest markers, and Makefile test targets.
JavaScript/TypeScript assessor bonus and evidence integration `src/agentready/assessors/testing.py`	JavaScript/TypeScript assessor applies helpers to detect `npm test` and related commands in context files, awards bonus points, and collects evidence on unit/integration directories and package.json test scripts.
Go assessor bonus and evidence integration `src/agentready/assessors/testing.py`	Go assessor applies helpers to detect `go test` commands in context files, awards bonus points, and collects evidence on integration build tags and Makefile test targets.
Test coverage for new assessor behavior `tests/unit/test_assessors_testing.py`	11 new tests validate documented-command bonus detection (positive and negative cases), single-file references, and organization evidence collection across Python, JavaScript, and Go; assertions confirm bonus scoring, evidence presence, and lack of penalties when organization signals are absent.

Suggested labels

released

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Title follows Conventional Commits format with type 'feat' and clear scope; describes the main changes accurately.
Linked Issues check	✅ Passed	All coding requirements from `#460` are met: A.2 detects unit/integration separation with language-specific heuristics and records evidence; A.3 implements test command documentation bonus (10 points) by scanning CLAUDE.md/AGENTS.md/README.md.
Out of Scope Changes check	✅ Passed	All changes directly support ADR A.2/A.3 implementation: assessor logic, documentation, and new test cases are aligned with linked issue requirements.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/460-test-assessor-enhancements

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch feat/460-test-assessor-enhancements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-28T20:07:26Z

📈 Test Coverage Report

Branch	Coverage
This PR	73.2%
Main	73.1%
Diff	✅ +0.1%

Coverage calculated from unit tests only

coderabbitai · 2026-05-28T20:09:16Z

Actionable comments posted: 0

jwm4 · 2026-05-28T20:35:38Z

Review: Test Assessor Enhancements (ADR A.2, A.3)

Tested the PR against four real repos (astronomer/dag-factory, ReactiveX/RxPY, go-graphite/go-carbon, and agentready itself) to validate both new features.

A.3 (Test command documentation bonus): Worked correctly across all four repos. Detection fired for both AGENTS.md (dag-factory, RxPY, agentready) and CLAUDE.md (go-carbon). The 10-point bonus is visible in scoring: e.g., RxPY scored 90 (80 base + 10 doc bonus).

A.2 (Test organization evidence): Correctly recorded as non-scoring evidence. Detected separate tests/unit/ and tests/integration/ directories (agentready) and @pytest.mark.integration markers (dag-factory). Repos without organization signals (RxPY, go-carbon) received no penalty, confirming "substantiating evidence, not a hard gate."

Minor observations (non-blocking):

Duplicated Makefile target scanning in _check_test_organization for Python and Go (same targets, same regex, same error handling). Not a problem at current scale, just noting for future language additions.
import json at function scope in the JS/TS branch of _check_test_organization (~line 283). It's also imported at function scope in _assess_javascript_coverage (line 226). Neither is wrong, but they're inconsistent with each other. Minor style nit.
Pattern overlap in Go between Signal 2 and Signal 5. A repo with make test in its README could earn points from both signals for overlapping evidence. The 100-point cap prevents over-scoring, so this is correct behavior, just worth being aware of.

Posted by Bill Murdock with assistance from Claude Code.

github-actions · 2026-05-28T20:39:14Z

🎉 This PR is included in version 2.45.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

coderabbitai Bot approved these changes May 28, 2026

View reviewed changes

jwm4 merged commit e12e075 into main May 28, 2026
6 checks passed

github-actions Bot added the released label May 28, 2026

jwm4 mentioned this pull request May 29, 2026

feat: enforcement and intent assessor improvements (ADR A.5, A.8) #484

Merged

5 tasks

coderabbitai Bot mentioned this pull request May 31, 2026

fix: scan AGENTS.md/CLAUDE.md for Go test commands (#470) #487

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: test assessor enhancements for command documentation and organization (ADR A.2, A.3)#481

feat: test assessor enhancements for command documentation and organization (ADR A.2, A.3)#481
jwm4 merged 1 commit into
mainfrom
feat/460-test-assessor-enhancements

jwm4 commented May 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Walkthrough

Changes

Suggested labels

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

jwm4 commented May 28, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jwm4 commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related issues

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Suggested labels

Uh oh!

github-actions Bot commented May 28, 2026

📈 Test Coverage Report

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

jwm4 commented May 28, 2026

Review: Test Assessor Enhancements (ADR A.2, A.3)

Uh oh!

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jwm4 commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading