Add adversarial-verification plugin by wizardengineer · Pull Request #148 · trailofbits/skills

wizardengineer · 2026-04-17T07:34:15Z

Summary

Adds a new adversarial-verification plugin that stress-tests claims, designs, and bug findings by dispatching two isolated sub-agents (advocate + skeptic) and synthesizing their arguments into a structured verdict. Counters sycophancy and single-agent agreement bias by forcing maximal disagreement before the caller commits.

What it does

Two modes:

Decision mode — free-form arguments organized by evaluation dimensions (for approach/design choices)
Proof mode — N null hypotheses the skeptic tries to prove and the advocate tries to refute (for verifying bug findings and security claims)

Core principle: isolated sub-agent contexts are non-negotiable. An agent that sees the other side's arguments will soften to accommodate them. The adversarial value comes from each agent arguing without knowledge of the counter-argument, with synthesis happening separately.

Structure

SKILL.md (main entry point with decision tree for mode selection)
references/decision-mode.md (structure for approach selection)
references/proof-mode.md (N-null-hypothesis structure for finding verification)
references/prompt-templates.md (advocate/skeptic templates enforcing anti-balance)
references/synthesis.md (verdict table format + recommendation structure)
references/anti-patterns.md (10 common failure modes with diagnoses)

When to use

Choosing between competing technical approaches
Verifying a bug finding is real (not a false positive)
Reviewing a design decision before commit
Any claim the caller is inclined to agree with by default

Test plan

python3 .github/scripts/validate_codex_skills.py passes (verified locally)
Plugin registered in .claude-plugin/marketplace.json
CODEOWNERS entry added
README table entry added under Verification section
Codex symlink created at .codex/skills/adversarial-verification
Install the plugin locally and invoke the skill on a real claim
Verify both decision-mode and proof-mode paths produce useful verdicts

🤖 Generated with Claude Code

CLAassistant · 2026-04-17T07:34:23Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

Stress-tests claims, designs, and bug findings by dispatching two isolated sub-agents (advocate + skeptic) and synthesizing their arguments into a structured verdict. Counters sycophancy and single- agent agreement bias by forcing maximal disagreement before commit. Two modes: - Decision mode: free-form arguments organized by evaluation dimensions for approach/design choices - Proof mode: N null hypotheses the skeptic proves and advocate refutes, for verifying bug findings and security claims Includes SKILL.md + 5 reference docs: - anti-patterns.md (10 common failure modes with diagnoses) - decision-mode.md (structure for approach selection) - proof-mode.md (N-null-hypothesis structure for finding verification) - prompt-templates.md (advocate/skeptic templates enforcing anti-balance) - synthesis.md (verdict table format and recommendation structure) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

wizardengineer marked this pull request as ready for review April 17, 2026 07:35

wizardengineer requested a review from dguido as a code owner April 17, 2026 07:35

claude bot reviewed Apr 17, 2026

View reviewed changes

wizardengineer marked this pull request as draft April 17, 2026 17:31

wizardengineer force-pushed the adversarial-verification-plugin branch from 7f05877 to 833dbef Compare April 17, 2026 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add adversarial-verification plugin#148

Add adversarial-verification plugin#148
wizardengineer wants to merge 1 commit intomainfrom
adversarial-verification-plugin

wizardengineer commented Apr 17, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Apr 17, 2026

Uh oh!

claude bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wizardengineer commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What it does

Structure

When to use

Test plan

Uh oh!

CLAassistant commented Apr 17, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wizardengineer commented Apr 17, 2026 •

edited

Loading