Skip to content

feat(agents): add Task Challenger adversarial questioning agent#1315

Open
rezatnoMsirhC wants to merge 7 commits intomainfrom
feat/1212-adversarial-task-challenger-agent
Open

feat(agents): add Task Challenger adversarial questioning agent#1315
rezatnoMsirhC wants to merge 7 commits intomainfrom
feat/1212-adversarial-task-challenger-agent

Conversation

@rezatnoMsirhC
Copy link
Copy Markdown
Contributor

@rezatnoMsirhC rezatnoMsirhC commented Apr 7, 2026

feat(agents): add Task Challenger adversarial questioning agent

Description

Added Task Challenger — an adversarial questioning agent that reads .copilot-tracking/ artifacts cold and interrogates every decision, boundary, and assumption through structured What/Why/How questions. The agent does not validate, suggest, coach, or guide; it asks.

The agent operates in four phases. In Phase 1 (Scope), it discovers what to challenge through a five-level cascade: existing .copilot-tracking/ artifacts, pr-reference.xml, git branch history, domain-based workspace search, and finally direct user input. Terminal access is limited to this phase only. Phase 2 (Read Artifacts) silently reads plans, changes, research, and reviews from .copilot-tracking/. Phase 3 (Identify Challenge Areas) silently selects the 5–7 areas with the highest density of unexamined assumptions — this list is never disclosed to the user. Phase 4 (Challenge) issues exactly one question per response using the structure [What/Why/How] + [noun subject] + [verb] + [open object]?, probes each answer up to twice before marking a point unresolved, and handles skip signals ("Go next", "Skip", etc.) without acknowledgment.

A Challenge Tracking Document is created at Phase 4 entry under .copilot-tracking/challenges/{{YYYY-MM-DD}}/{{topic}}-challenge.md. It captures metadata, confirmed scope, identified challenge areas, a Q&A log with verbatim answers, probe exchanges, and an Unresolved Items table.

Task Researcher was updated to check .copilot-tracking/challenges/ at the start of Phase 1 and, when a challenge document is present, treat its Q&A log and unresolved items as the primary research scope — establishing a formal data contract from challenger outputs to researcher inputs. Task Reviewer received a new 🥊 Challenge handoff that routes to Task Challenger via /task-challenge. The companion task-challenge.prompt.md provides four optional inputs (plan, changes, research, focus) that pre-position scope at invocation; the agent falls back to artifact discovery when none are supplied.

Both hve-core and hve-core-all collections and plugins were updated in lockstep, and .github/copilot-instructions.md was updated to register .copilot-tracking/challenges/ as a known tracking directory.

Related Issue(s)

Related to #1212

Type of Change

Select all that apply:

Code & Documentation:

  • Bug fix (non-breaking change fixing an issue)
  • New feature (non-breaking change adding functionality)
  • Breaking change (fix or feature causing existing functionality to change)
  • Documentation update

Infrastructure & Configuration:

  • GitHub Actions workflow
  • Linting configuration (markdown, PowerShell, etc.)
  • Security configuration
  • DevContainer configuration
  • Dependency update

AI Artifacts:

  • Reviewed contribution with prompt-builder agent and addressed all feedback
  • Copilot instructions (.github/instructions/*.instructions.md)
  • Copilot prompt (.github/prompts/*.prompt.md)
  • Copilot agent (.github/agents/*.agent.md)
  • Copilot skill (.github/skills/*/SKILL.md)

Note for AI Artifact Contributors:

  • Agents: Research, indexing/referencing other project (using standard VS Code GitHub Copilot/MCP tools), planning, and general implementation agents likely already exist. Review .github/agents/ before creating new ones.
  • Skills: Must include both bash and PowerShell scripts. See Skills.
  • Model Versions: Only contributions targeting the latest Anthropic and OpenAI models will be accepted. Older model versions (e.g., GPT-3.5, Claude 3) will be rejected.
  • See Agents Not Accepted and Model Version Requirements.

Other:

  • Script/automation (.ps1, .sh, .py)
  • Other (please describe):

Sample Prompts (for AI Artifact Contributions)

User Request:

"Challenge this implementation" with the Task Challenger agent selected, or via /task-challenge from the Task Reviewer's 🥊 Challenge handoff. Optional arguments: /task-challenge plan=.copilot-tracking/plans/my-plan.md focus=authentication.

Execution Flow:

  1. Phase 1 (Scope): Discovers what to challenge through a five-level cascade — .copilot-tracking/ tracking artifacts, pr-reference.xml, git log/diff/status (via git branch --show-current, git log <parent>..HEAD --oneline, git diff --stat), domain-based repo search, or direct user prompt. Presents a factual scope summary and waits for explicit user confirmation before proceeding. Terminal commands are run only during this phase.
  2. Phase 2 (Read Artifacts): Silently reads .copilot-tracking/plans/, changes/, research/, and reviews/.
  3. Phase 3 (Identify Challenge Areas): Silently identifies 5–7 assumption-dense areas. List is never disclosed to the user.
  4. Phase 4 (Challenge): Creates .copilot-tracking/challenges/{{YYYY-MM-DD}}/{{topic}}-challenge.md. Issues one [What/Why/How] + […]? question per response. Probes each answer up to twice; marks unresolved after two probes with no new depth. Advances silently on skip signals. Updates the tracking document throughout the session.
  5. On completion, the Compact handoff summarizes state (including complete Q&A and unresolved items) and defaults the next step to Task Researcher.

Output Artifacts:

<!-- .copilot-tracking/challenges/2026-04-07/task-challenger-agent-challenge.md -->
<!-- markdownlint-disable-file -->
# Challenge Session: task-challenger-agent

**Date**: 2026-04-07
**Scope source**: Level 1 — .copilot-tracking/ artifacts
**Related artifacts**: .copilot-tracking/plans/..., .copilot-tracking/changes/...

## Confirmed Scope
...

## Q&A Log
### Area: scope boundaries
**Q**: What does the five-level cascade exclude?
**A**: [verbatim answer]
  **Probe**: How is an empty artifacts folder distinguished from a missing one?
  **A**: [verbatim answer]

## Unresolved Items

| # | Area             | Last Question Asked                                        | Reason                        |
|---|------------------|------------------------------------------------------------|-------------------------------|
| 1 | scope boundaries | How is an empty folder distinguished from a missing one?   | No new depth after two probes |

Success Indicators:

  • Agent presents a factual scope summary and waits for explicit confirmation before entering Phase 4.
  • Each Phase 4 response contains exactly one question with no preamble, no affirmation, and no suggestion.
  • The challenge tracking document is created at Phase 4 entry and updated after each Q&A exchange.
  • Switching to Task Researcher after the session shows the challenge document Q&A as the primary research scope.

For detailed contribution requirements, see:

Testing

All automated checks were run during PR generation.

Check Command Result
Markdown linting npm run lint:md ✅ Passed — 0 errors across 196 files
Spell checking npm run spell-check ✅ Passed — 0 issues across 298 files
Frontmatter validation npm run lint:frontmatter ✅ Passed — 0 errors, 0 warnings across 490 files
Skill structure validation npm run validate:skills ✅ Passed — 14 skills, 0 errors
Link validation npm run lint:md-links ❌ Pre-existing SECURITY.md failure on main — not introduced by this PR
PowerShell analysis npm run lint:ps ✅ Passed — all files clean
Plugin freshness npm run plugin:generate ✅ Passed — two table-formatting fixups applied (see Additional Notes)
Docusaurus tests npm run docs:test ⏭️ Skipped — jest not installed; pre-existing environment limitation

Security analysis:

  • Terminal access is explicitly contained to Phase 1 only; the agent declares execute/runInTerminal and execute/getTerminalOutput in tools but the agent body restricts their use to the Scope phase.
  • No write or edit tools declared — the agent cannot modify files.
  • Challenge tracking documents write to .copilot-tracking/challenges/, which is gitignored per existing repo convention.
  • No new npm, pip, or external package dependencies.
  • No credentials, tokens, or sensitive data introduced.

Sample Run

Scope Confirmation Phase

image

Conversational Q&A Phase

image image image image

Checklist

Required Checks

  • Documentation is updated (if applicable)
  • Files follow existing naming conventions
  • Changes are backwards compatible (if applicable)
  • Tests added for new functionality (if applicable) (N/A — no test infrastructure for agent/prompt files)

AI Artifact Contributions

  • Used /prompt-analyze to review contribution
  • Addressed all feedback from prompt-builder review
  • Verified contribution follows common standards and type-specific requirements

Required Automated Checks

The following validation commands must pass before merging:

  • Markdown linting: npm run lint:md
  • Spell checking: npm run spell-check
  • Frontmatter validation: npm run lint:frontmatter
  • Skill structure validation: npm run validate:skills
  • Link validation: npm run lint:md-links (pre-existing SECURITY.md failure on main)
  • PowerShell analysis: npm run lint:ps
  • Plugin freshness: npm run plugin:generate
  • Docusaurus tests: npm run docs:test (jest not installed; pre-existing environment limitation)

Security Considerations

  • This PR does not contain any sensitive or NDA information
  • Any new dependencies have been reviewed for security issues (N/A — no new dependencies)
  • Security-related scripts follow the principle of least privilege (N/A — no security scripts modified)

Additional Notes

  • plugin:generate applied two minor fixups during validation: markdown table column padding in task-challenger.agent.md and a missing task-challenge row in plugins/hve-core-all/README.md. Both are present as unstaged local modifications and should be committed before merging.
  • docs/agents/README.md names five RPI agents (task-researcher, task-planner, task-implementor, task-reviewer, and the RPI orchestrator) but does not yet mention Task Challenger. Consider a follow-up to update that reference.
  • The lint:md-links failure on SECURITY.md is pre-existing on main and unrelated to this PR.

- add task-challenger.agent.md with What/Why/How interrogation protocol
- add task-challenge.prompt.md with optional artifact inputs
- add 🥊 Challenge handoff to task-reviewer.agent.md
- register agent and prompt in hve-core and hve-core-all collections
- regenerate plugin outputs

🥊 - Generated by Copilot
…t/1212-adversarial-task-challenger-agent
… task-challenger

- add Phase 1: Scope with artifact discovery, git fallback, and user confirmation
- scope Prohibited Behaviors and Response Format to Challenge Phase only
- add execute/runInTerminal and execute/getTerminalOutput to tools frontmatter
- renumber Read → Phase 2, Identify → Phase 3, Challenge → Phase 4
- add 'Go next' skip signal handling to Phase 4 Protocol

✨ - Generated by Copilot
- add five-level ordered scope fallback with verified git commands
- auto-create challenge tracking document at Phase 4 entry
- add Challenge Tracking Document Schema section to Phase 4
- weight Compact handoff toward Task Researcher as default
- add challenges/ to .copilot-tracking listing in copilot-instructions.md

⚡ - Generated by Copilot
…nger handoffs

- add challenges/ artifact check to Task Researcher Phase 1 Step 1
- update Task Challenger handoff prompts to reference challenge document path

🔗 - Generated by Copilot
…t/1212-adversarial-task-challenger-agent
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.62%. Comparing base (a1928f3) to head (2e5b2bb).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1315      +/-   ##
==========================================
- Coverage   87.63%   87.62%   -0.02%     
==========================================
  Files          61       61              
  Lines        9328     9328              
==========================================
- Hits         8175     8174       -1     
- Misses       1153     1154       +1     
Flag Coverage Δ
pester 85.18% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rezatnoMsirhC rezatnoMsirhC marked this pull request as ready for review April 7, 2026 22:33
@rezatnoMsirhC rezatnoMsirhC requested a review from a team as a code owner April 7, 2026 22:33
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: feat(agents): add Task Challenger adversarial questioning agent

This is a well-conceived and thoroughly described feature. The adversarial questioning model, four-phase protocol, and data contract from challenger outputs to researcher inputs are all clearly specified. The automated checks pass. Two issues need to be resolved before merging.


Issue Alignment

The PR description uses Related to #1212 rather than Fixes #1212 or Closes #1212. The PR template instructs contributors to use Fixes # or Closes # syntax. If this PR fully delivers the feature described in the linked issue, consider updating to Closes #1212 so the issue is automatically closed on merge. If the issue is intentionally left open (e.g., it tracks ongoing work), a brief note explaining why would clarify intent.


PR Template Compliance

⚠️ AI Artifact Contributions checklist — all three items unchecked

The PR adds AI artifact files (a new agent and a new prompt), which requires completing the AI Artifact Contributions checklist:

* [ ] Used `/prompt-analyze` to review contribution
* [ ] Addressed all feedback from `prompt-builder` review
* [ ] Verified contribution follows common standards and type-specific requirements

All three remain unchecked. These checkboxes represent a required quality gate for AI artifact contributions. Please complete the /prompt-analyze review, address any feedback, and check these items before requesting re-review.

ℹ️ Documentation checkbox

The Documentation is updated (if applicable) checkbox is unchecked. The Additional Notes section acknowledges that docs/agents/README.md currently omits the Task Challenger. While deferring the docs update to a follow-up is a reasonable call, the checkbox should carry an inline (N/A — deferred to follow-up issue) annotation to make that intent explicit, per the checklist conventions used elsewhere in the template.


Coding Standards

disable-model-invocation: true missing from task-challenger.agent.md frontmatter (inline comment on line 4)

The agent declares execute/runInTerminal and execute/getTerminalOutput in tools:, writes files to .copilot-tracking/challenges/, and runs git commands during Phase 1. This makes it a side-effecting agent. task-reviewer.agent.md — a direct peer — sets disable-model-invocation: true for the same reason. The prompt-builder instructions require this field for side-effecting and explicitly-invoked agents. Add disable-model-invocation: true to the frontmatter.


Code Quality

💡 Duplicate "Response Format" section (inline comment on line ~214)

The file defines the Phase 4 response format in two places: a nested #### Response Format inside ### Phase 4: Challenge, and a top-level ## Response Format at the end of the file. Both sections state the same one-question rule with the same examples. Duplicate instructions create a maintenance hazard — a future edit to one copy may silently diverge from the other. Consolidate into the nested section, which is already the more contextually natural location.


Action Items

  1. Required — Check all three AI Artifact Contributions checklist items after completing the /prompt-analyze review.
  2. Required — Add disable-model-invocation: true to the task-challenger.agent.md frontmatter.
  3. Suggested — Remove the duplicate top-level ## Response Format section; retain only the nested #### Response Format under Phase 4.
  4. Minor — Annotate the Documentation checkbox with (N/A — deferred to follow-up) to make the deferral explicit.

Note

🔒 Integrity filter blocked 1 item

The following item were blocked because they don't meet the GitHub integrity level.

  • #1212 issue_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by PR Review for issue #1315

---
name: Task Challenger
description: 'Adversarial questioning agent that interrogates implementations with What/Why/How questions — no suggestions, no hints, no leading - Brought to you by microsoft/hve-core'
tools: [read, search, execute/runInTerminal, execute/getTerminalOutput]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing disable-model-invocation: true in frontmatter

This agent declares execute/runInTerminal and execute/getTerminalOutput in its tools: list, making it a side-effecting agent. Per the prompt-builder instructions:

Use disable-model-invocation: true for agents that run subagents, agents that cause side effects (git operations, backlog management, deployments), or agents that should only run when explicitly requested.

task-reviewer.agent.md — a peer agent in the same collection — already sets disable-model-invocation: true and has a comparable profile (side effects, subagent orchestration). Task Challenger writes to .copilot-tracking/challenges/ and runs git commands; it should follow the same convention.

Suggested fix:

---
name: Task Challenger
description: 'Adversarial questioning agent...'
disable-model-invocation: true
tools: [read, search, execute/runInTerminal, execute/getTerminalOutput]


### {{Area Label}}

**Question**: {{question text}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate "Response Format" section

The agent defines the Challenge Phase response format in two places:

  1. #### Response Format nested under ### Phase 4: Challenge (earlier in the file)
  2. This top-level ## Response Format section — which adds a clarifying note (> This section applies during the Challenge Phase (Phase 4) only.) but otherwise restates the same requirement verbatim

Having two sections describing identical behavior creates maintenance risk: a future edit to one may miss the other, causing the agent to receive contradictory instructions. The nested #### Response Format already lives where it is most contextually relevant — inside Phase 4.

Suggested resolution: Remove this top-level ## Response Format section and, if the "applies to Phase 4 only" clarification is important, add that note to the nested #### Response Format heading inside Phase 4 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants