Skip to content

test(e2e): add agent v2 core coverage#38209

Merged
lyzno1 merged 192 commits into
mainfrom
codex/agent-v2-core-e2e
Jul 2, 2026
Merged

test(e2e): add agent v2 core coverage#38209
lyzno1 merged 192 commits into
mainfrom
codex/agent-v2-core-e2e

Conversation

@lyzno1

@lyzno1 lyzno1 commented Jun 30, 2026

Copy link
Copy Markdown
Member

Summary

Adds core Agent v2 E2E coverage and the supporting Agent Builder fixture infrastructure for follow-up scenarios.

The scenarios use API setup for prerequisite Agent state and Playwright assertions for user-visible behavior. The support helpers now cover Agent v2 build draft setup, publish-state reads, Backend service API access, E2E resource naming, checked-in and generated file fixtures, typed cleanup queues, Console API preflight checks for preseeded resources, runnable-model Agent setup, preseeded-environment readiness scenarios, and Configure autosave waiting.

Coverage

  • Configure validation: Preview is unavailable until a required model is configured.
  • Configure persistence: persisted instructions remain visible after refresh.
  • Build draft: discarding a pending build draft restores the original configuration.
  • Publish: publishing a configured draft updates the visible and backend published state.
  • Access Point: Web app, Backend service API, and Workflow access surfaces are visible.
  • Backend service API: an API-enabled Agent can be opened from the Agent Roster, navigated to Access Point, copied from its Service API Endpoint, managed through API Secret key creation, verified not to expose full secrets in the default list, and opened to API Reference in a new tab.

Fixture Infrastructure

  • Centralized generated test resource names under the E2E prefix.
  • Added Agent Builder file fixtures for upload, env import, file tree, and file-count boundaries.
  • Added runtime generation helpers and manifest entries for large/slow upload materials without checking large files into git.
  • Added preseeded Agent Builder resource constants and blocked-precondition helpers.
  • Added Console API preflight steps for stable and broken chat models, preseeded Agents, preseeded workflow apps, workflow references, preseeded datasets, indexed-ready datasets, indexing or queued datasets, published Web app access, built-in tools, Agent drive skills, Agent file-tree drive fixture files, Full Config Agent core fixture configuration, Tool States Agent fixture configuration, Dual Retrieval Agent fixture configuration, and Agent Backend service API key availability.
  • Added @agent-v2-preflight scenarios so seeded environments can verify readiness per resource and get a clear blocked-precondition report for missing resources.
  • Added a runnable Agent fixture that applies the verified stable model through the Agent Soul model config with deterministic E2E model settings.
  • Added typed cleanup queues for datasets, Agent drive files, and built-in tool credentials, plus a generic cleanup registry for other resource types.
  • Ordered scenario cleanup so Agent drive files are removed first, then created Agents and Apps, then dependent datasets and tool credentials.
  • Restored pnpm -C e2e check to vp check --fix; run pnpm -C e2e type-check separately for tsgo.
  • Added a Configure autosave wait helper based on the visible saved state.

Model Preflight

Scenarios that need a runnable model should use the stable model preflight step as the single model-selection contract. It reads E2E_STABLE_MODEL_PROVIDER, E2E_STABLE_MODEL_NAME, and optional E2E_STABLE_MODEL_TYPE (default llm), then verifies the model is present and active through the Console models API before storing it on the Cucumber world. Scenario setup should explicitly apply that stored model when it needs a usable model, rather than hard-coding provider or model names in feature files or hooks.

Model-recovery scenarios can use the broken model preflight step. It reads E2E_BROKEN_MODEL_PROVIDER, optional E2E_BROKEN_MODEL_NAME (default e2e-broken-model), and optional E2E_BROKEN_MODEL_TYPE (default llm). This only verifies the model entry exists; the scenario must still assert the visible failure and recovery behavior.

Resource Preflight Boundaries

The preflight steps verify resources through public Console APIs where a stable contract exists: model provider/model, Agent roster item, workflow app, Agent-to-workflow published reference, dataset existence, indexed-ready dataset documents, indexing or queued dataset documents, built-in tool, Agent drive Skill, Agent file-tree drive fixture files, Agent Soul core fixture configuration, Tool States Agent fixture configuration with JSON Replace, Tavily Search, and Tavily credential reference, Dual Retrieval Agent fixture configuration with Agent-decide and custom-query knowledge sets bound to the indexed test dataset, Agent published Web app state with site URL data, and Agent Backend service API enabled with at least one key.

Fixed-content knowledge assertions, file-tree visual expansion and preview behavior, Web app runtime behavior, workflow runtime behavior, and tool credential validity still need scenario-level user-visible assertions before they should be treated as fully automated outcomes. The Tool States preflight verifies that the seeded Agent can exercise the tool-status UI, but it does not prove the Tavily credential is invalid or that runtime error recovery is visible. The Dual Retrieval preflight verifies that the seeded Agent can exercise the Knowledge Retrieval display, but it does not prove retrieval hit results or expand/collapse behavior. The Console API key response does not expose a human-readable key name, so the Backend service API key preflight verifies availability rather than a specific display name. Agent node output variables are not validated by the roster Agent App core preflight because those live in workflow node-job declared_outputs, not the roster Agent App composer response.

Validation

  • pnpm -C e2e check
  • pnpm -C e2e type-check
  • pnpm -C e2e exec cucumber-js --config cucumber.config.ts --dry-run
  • pnpm -C e2e exec cucumber-js --config cucumber.config.ts --dry-run --tags @agent-v2-preflight
  • pnpm -C e2e e2e:full -- --tags @agent-v2-preflight
  • pnpm -C e2e e2e:full -- --tags '@agent-v2 and not @infra'
  • git diff --check

@lyzno1 lyzno1 force-pushed the codex/agent-v2-core-e2e branch from db2b0bb to 704f007 Compare June 30, 2026 10:02
lyzno1 added 27 commits July 1, 2026 11:48
@lyzno1 lyzno1 marked this pull request as ready for review July 2, 2026 05:52
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jul 2, 2026
@lyzno1 lyzno1 enabled auto-merge July 2, 2026 05:52
@lyzno1 lyzno1 added this pull request to the merge queue Jul 2, 2026
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jul 2, 2026
Merged via the queue into main with commit 1ae2f95 Jul 2, 2026
46 checks passed
@lyzno1 lyzno1 deleted the codex/agent-v2-core-e2e branch July 2, 2026 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

e2e End-to-end tests and E2E test infrastructure. lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files. web This relates to changes on the web.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants