test(e2e): add agent v2 core coverage#38209
Merged
Merged
Conversation
db2b0bb to
704f007
Compare
iamjoel
approved these changes
Jul 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds core Agent v2 E2E coverage and the supporting Agent Builder fixture infrastructure for follow-up scenarios.
The scenarios use API setup for prerequisite Agent state and Playwright assertions for user-visible behavior. The support helpers now cover Agent v2 build draft setup, publish-state reads, Backend service API access, E2E resource naming, checked-in and generated file fixtures, typed cleanup queues, Console API preflight checks for preseeded resources, runnable-model Agent setup, preseeded-environment readiness scenarios, and Configure autosave waiting.
Coverage
Fixture Infrastructure
E2Eprefix.@agent-v2-preflightscenarios so seeded environments can verify readiness per resource and get a clear blocked-precondition report for missing resources.pnpm -C e2e checktovp check --fix; runpnpm -C e2e type-checkseparately fortsgo.Model Preflight
Scenarios that need a runnable model should use the stable model preflight step as the single model-selection contract. It reads
E2E_STABLE_MODEL_PROVIDER,E2E_STABLE_MODEL_NAME, and optionalE2E_STABLE_MODEL_TYPE(defaultllm), then verifies the model is present andactivethrough the Console models API before storing it on the Cucumber world. Scenario setup should explicitly apply that stored model when it needs a usable model, rather than hard-coding provider or model names in feature files or hooks.Model-recovery scenarios can use the broken model preflight step. It reads
E2E_BROKEN_MODEL_PROVIDER, optionalE2E_BROKEN_MODEL_NAME(defaulte2e-broken-model), and optionalE2E_BROKEN_MODEL_TYPE(defaultllm). This only verifies the model entry exists; the scenario must still assert the visible failure and recovery behavior.Resource Preflight Boundaries
The preflight steps verify resources through public Console APIs where a stable contract exists: model provider/model, Agent roster item, workflow app, Agent-to-workflow published reference, dataset existence, indexed-ready dataset documents, indexing or queued dataset documents, built-in tool, Agent drive Skill, Agent file-tree drive fixture files, Agent Soul core fixture configuration, Tool States Agent fixture configuration with JSON Replace, Tavily Search, and Tavily credential reference, Dual Retrieval Agent fixture configuration with Agent-decide and custom-query knowledge sets bound to the indexed test dataset, Agent published Web app state with site URL data, and Agent Backend service API enabled with at least one key.
Fixed-content knowledge assertions, file-tree visual expansion and preview behavior, Web app runtime behavior, workflow runtime behavior, and tool credential validity still need scenario-level user-visible assertions before they should be treated as fully automated outcomes. The Tool States preflight verifies that the seeded Agent can exercise the tool-status UI, but it does not prove the Tavily credential is invalid or that runtime error recovery is visible. The Dual Retrieval preflight verifies that the seeded Agent can exercise the Knowledge Retrieval display, but it does not prove retrieval hit results or expand/collapse behavior. The Console API key response does not expose a human-readable key name, so the Backend service API key preflight verifies availability rather than a specific display name. Agent node output variables are not validated by the roster Agent App core preflight because those live in workflow node-job
declared_outputs, not the roster Agent App composer response.Validation
pnpm -C e2e checkpnpm -C e2e type-checkpnpm -C e2e exec cucumber-js --config cucumber.config.ts --dry-runpnpm -C e2e exec cucumber-js --config cucumber.config.ts --dry-run --tags @agent-v2-preflightpnpm -C e2e e2e:full -- --tags @agent-v2-preflightpnpm -C e2e e2e:full -- --tags '@agent-v2 and not @infra'git diff --check