Skip to content

Add adversarial-verification plugin#148

Draft
wizardengineer wants to merge 1 commit intomainfrom
adversarial-verification-plugin
Draft

Add adversarial-verification plugin#148
wizardengineer wants to merge 1 commit intomainfrom
adversarial-verification-plugin

Conversation

@wizardengineer
Copy link
Copy Markdown

@wizardengineer wizardengineer commented Apr 17, 2026

Summary

Adds a new adversarial-verification plugin that stress-tests claims, designs, and bug findings by dispatching two isolated sub-agents (advocate + skeptic) and synthesizing their arguments into a structured verdict. Counters sycophancy and single-agent agreement bias by forcing maximal disagreement before the caller commits.

What it does

Two modes:

  • Decision mode — free-form arguments organized by evaluation dimensions (for approach/design choices)
  • Proof mode — N null hypotheses the skeptic tries to prove and the advocate tries to refute (for verifying bug findings and security claims)

Core principle: isolated sub-agent contexts are non-negotiable. An agent that sees the other side's arguments will soften to accommodate them. The adversarial value comes from each agent arguing without knowledge of the counter-argument, with synthesis happening separately.

Structure

  • SKILL.md (main entry point with decision tree for mode selection)
  • references/decision-mode.md (structure for approach selection)
  • references/proof-mode.md (N-null-hypothesis structure for finding verification)
  • references/prompt-templates.md (advocate/skeptic templates enforcing anti-balance)
  • references/synthesis.md (verdict table format + recommendation structure)
  • references/anti-patterns.md (10 common failure modes with diagnoses)

When to use

  • Choosing between competing technical approaches
  • Verifying a bug finding is real (not a false positive)
  • Reviewing a design decision before commit
  • Any claim the caller is inclined to agree with by default

Test plan

  • python3 .github/scripts/validate_codex_skills.py passes (verified locally)
  • Plugin registered in .claude-plugin/marketplace.json
  • CODEOWNERS entry added
  • README table entry added under Verification section
  • Codex symlink created at .codex/skills/adversarial-verification
  • Install the plugin locally and invoke the skill on a real claim
  • Verify both decision-mode and proof-mode paths produce useful verdicts

🤖 Generated with Claude Code

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@wizardengineer wizardengineer marked this pull request as ready for review April 17, 2026 07:35
@wizardengineer wizardengineer requested a review from dguido as a code owner April 17, 2026 07:35
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@wizardengineer wizardengineer marked this pull request as draft April 17, 2026 17:31
Stress-tests claims, designs, and bug findings by dispatching two
isolated sub-agents (advocate + skeptic) and synthesizing their
arguments into a structured verdict. Counters sycophancy and single-
agent agreement bias by forcing maximal disagreement before commit.

Two modes:
- Decision mode: free-form arguments organized by evaluation dimensions
  for approach/design choices
- Proof mode: N null hypotheses the skeptic proves and advocate refutes,
  for verifying bug findings and security claims

Includes SKILL.md + 5 reference docs:
- anti-patterns.md (10 common failure modes with diagnoses)
- decision-mode.md (structure for approach selection)
- proof-mode.md (N-null-hypothesis structure for finding verification)
- prompt-templates.md (advocate/skeptic templates enforcing anti-balance)
- synthesis.md (verdict table format and recommendation structure)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wizardengineer wizardengineer force-pushed the adversarial-verification-plugin branch from 7f05877 to 833dbef Compare April 17, 2026 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants