fix: separate stderr from stdout in benchmark to prevent invalid JSON by Mossaka · Pull Request #1764 · github/gh-aw-firewall

Mossaka · 2026-04-07T22:10:38Z

Summary

Redirect benchmark stderr to benchmark-progress.log instead of mixing it into benchmark-results.json via 2>&1, which was corrupting the JSON output
Add a JSON validation step (jq empty) after the benchmark run to fail fast if the output is not valid JSON
Upload benchmark-progress.log alongside results as a CI artifact for debugging
Document the stdout/stderr contract in benchmark-performance.ts (stdout = JSON only, stderr = progress)

Test plan

YAML validates (npx js-yaml)
npm run build succeeds
npm test passes (1311/1311)
CI passes on this PR

🤖 Generated with Claude Code

The benchmark run step used `2>&1` which mixed stderr progress logs into benchmark-results.json, making downstream jq parsing silently fail. Now stderr goes to benchmark-progress.log, and a JSON validation step catches any corruption early. Closes #1757 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-07T22:11:37Z

✅ Coverage Check Passed

Overall Coverage

Metric	Base	PR	Delta
Lines	86.14%	86.23%	📈 +0.09%
Statements	86.02%	86.11%	📈 +0.09%
Functions	87.45%	87.45%	➡️ +0.00%
Branches	78.81%	78.86%	📈 +0.05%

📁 Per-file Coverage Changes (1 files)

File	Lines (Before → After)	Statements (Before → After)
`src/docker-manager.ts`	86.3% → 86.7% (+0.37%)	85.9% → 86.2% (+0.36%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copilot

Pull request overview

This PR fixes the performance benchmark CI workflow so that JSON results written to stdout are not corrupted by progress/diagnostic output written to stderr, addressing issue #1757.

Changes:

Redirect benchmark stderr to benchmark-progress.log while keeping stdout as benchmark-results.json.
Add a jq empty validation step to fail fast when the results file is not valid JSON.
Upload both the JSON results and progress log as CI artifacts; document the stdout/stderr contract in the benchmark script.

Show a summary per file

File	Description
`scripts/ci/benchmark-performance.ts`	Documents the required stdout (JSON only) vs stderr (progress/diagnostics) contract relied on by CI.
`.github/workflows/performance-monitor.yml`	Separates stdout/stderr capture for the benchmark, validates JSON output, and uploads the progress log artifact.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 1

Copilot · 2026-04-07T22:13:08Z

.github/workflows/performance-monitor.yml

+      - name: Validate benchmark JSON
+        run: |
+          if ! jq empty benchmark-results.json 2>/dev/null; then
+            echo "ERROR: benchmark-results.json is not valid JSON"
+            cat benchmark-results.json
+            exit 1
+          fi
+
      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: benchmark-results-${{ github.sha }}
-          path: benchmark-results.json
+          path: |
+            benchmark-results.json
+            benchmark-progress.log
          retention-days: 90


If JSON validation fails, the job exits before the artifact upload step, so benchmark-progress.log/benchmark-results.json won’t be available as CI artifacts when you most need them for debugging. Consider making the upload step run with if: always() (or move the upload before validation) so artifacts are preserved even when the validation step fails.

Done. Moved the upload step before validation and added if: always() so artifacts are preserved even when validation fails.

Mossaka

Security Review - PR #1764

Reviewed the full diff and the complete performance-monitor.yml workflow plus benchmark-performance.ts.

Summary

This PR separates stderr from stdout (2>benchmark-progress.log instead of 2>&1) to prevent benchmark progress messages from corrupting the JSON output. The change is narrow and well-scoped.

Findings

No security issues found in the changes introduced by this PR.

The stderr-to-file redirect and JSON validation step are safe. The new benchmark-progress.log artifact contains only console.error output from the benchmark script (iteration counters, progress messages) -- no secrets or sensitive data.

Pre-existing observations (not introduced by this PR)

These are informational notes about the existing workflow, not blockers for this PR:

[Info] Actions pinned to major version, not SHA: actions/checkout@v4, actions/setup-node@v4, actions/upload-artifact@v4, and actions/github-script@v7 are pinned to major versions rather than commit SHAs. This is standard practice for first-party GitHub actions and acceptable here, though SHA pinning would be stronger for supply-chain hardening.
[Info] github.workspace in heredoc: Line 43 uses ${{ github.workspace }} inside a heredoc written to /usr/local/bin/awf. This is safe because github.workspace is a runner-controlled path (e.g., /home/runner/work/repo/repo) that cannot be influenced by external actors. The heredoc delimiter is quoted (<<'WRAPPER'), preventing shell expansion of the content -- only the GitHub expression is interpolated by Actions before the shell runs.
[Info] || true on benchmark step: The || true means benchmark failures won't fail the step. This is intentional (regressions are handled by the separate "Check for regressions" step and issue creation), but worth noting that a completely broken benchmark silently produces invalid JSON. The new JSON validation step in this PR actually mitigates this by explicitly failing if the output is not valid JSON -- a nice improvement.
[Info] Benchmark data in issue body: The actions/github-script step at line 93 reads benchmark-results.json and interpolates it into an issue body. Since the JSON is generated by the benchmark script (not from external input) and the workflow only triggers on schedule and workflow_dispatch (not on PRs from forks), there is no injection vector here.
[Info] Workflow permissions are properly scoped: contents: read and issues: write -- minimal permissions for what the workflow does. Good.

Verdict

Clean. The changes are safe. The stderr separation and JSON validation are both improvements.

Security Review Agent

Address Copilot review: ensure artifacts are uploaded even when validation fails, so logs are available for debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-07T22:41:48Z

Smoke Test Results

✅ GitHub MCP: "test: add CLI proxy sidecar integration tests" / "chore: upgrade workflows to gh-aw-actions v0.67.2"
✅ Playwright: github.com title contains "GitHub"
✅ File Write: /tmp/gh-aw/agent/smoke-test-claude-24107778191.txt created
✅ Bash: File verified via cat

Overall: PASS

💥 [THE END] — Illustrated by Smoke Claude

github-actions · 2026-04-07T22:42:23Z

🔥 Smoke Test Results

Test	Status
GitHub MCP (PR: "test: add CLI proxy sidecar integration tests")	✅
GitHub.com connectivity	✅
File write/read (`/tmp/gh-aw/agent/smoke-test-copilot-24107778144.txt`)	✅

Overall: PASS — PR by @Mossaka

fix: separate stderr from stdout in benchmark to prevent invalid JSON

📰 BREAKING: Report filed by Smoke Copilot

github-actions · 2026-04-07T22:42:39Z

Smoke test (Codex) results
PR titles:

fix: chown gh-aw config dirs to agent user before privilege drop in entrypoint (agent-container: root-owned gh-aw config files unreadable by chroot user on self-hosted runners; chmod before privilege drop in [Content truncated due to length] #1463)
test: add CLI proxy sidecar integration tests

GitHub MCP merged PR review ✅
safeinputs-gh PR query ❌
Playwright GitHub title check ❌
Tavily web search ❌
File write + cat ✅
Discussion query/comment ❌
npm ci && npm run build ✅
Overall status: FAIL

🔮 The oracle has spoken through Smoke Codex

github-actions · 2026-04-07T22:43:16Z

Smoke Test: GitHub Actions Services Connectivity ✅

All checks passed:

Check	Result
Redis PING (`host.docker.internal:6379`)	✅ `PONG`
PostgreSQL ready (`host.docker.internal:5432`)	✅ `accepting connections`
PostgreSQL `SELECT 1` (`smoketest` db, user `postgres`)	✅ Returns `1`

Note: redis-cli was not available; Redis was tested via raw TCP (nc) with the Redis protocol — response was +PONG.

🔌 Service connectivity validated by Smoke Services

github-actions · 2026-04-07T22:43:28Z

🏗️ Build Test Suite Results

Ecosystem	Project	Build/Install	Tests	Status
Bun	elysia	✅	1/1 passed	✅ PASS
Bun	hono	✅	1/1 passed	✅ PASS
C++	fmt	✅	N/A	✅ PASS
C++	json	✅	N/A	✅ PASS
Deno	oak	N/A	1/1 passed	✅ PASS
Deno	std	N/A	1/1 passed	✅ PASS
.NET	hello-world	✅	N/A	✅ PASS
.NET	json-parse	✅	N/A	✅ PASS
Go	color	✅	1/1 passed	✅ PASS
Go	env	✅	1/1 passed	✅ PASS
Go	uuid	✅	1/1 passed	✅ PASS
Java	gson	✅	1/1 passed	✅ PASS
Java	caffeine	✅	1/1 passed	✅ PASS
Node.js	clsx	✅	All passed	✅ PASS
Node.js	execa	✅	All passed	✅ PASS
Node.js	p-limit	✅	All passed	✅ PASS
Rust	fd	✅	1/1 passed	✅ PASS
Rust	zoxide	✅	1/1 passed	✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #1764 · ● 596.3K · ◷

Copilot AI review requested due to automatic review settings April 7, 2026 22:10

Copilot started reviewing on behalf of Mossaka April 7, 2026 22:11 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

github-actions bot added the smoke-claude label Apr 7, 2026

This comment has been minimized.

Sign in to view

github-actions bot added the smoke-copilot label Apr 7, 2026

github-actions bot mentioned this pull request Apr 7, 2026

[aw] No-Op Runs #1668

Open

This comment has been minimized.

Sign in to view

github-actions bot added smoke-services build-test labels Apr 7, 2026

This comment has been minimized.

Sign in to view

Mossaka commented Apr 7, 2026

View reviewed changes

fix: move artifact upload before JSON validation with if: always()

447431c

Address Copilot review: ensure artifacts are uploaded even when validation fails, so logs are available for debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Mossaka merged commit 990682f into main Apr 7, 2026
58 of 59 checks passed

Mossaka deleted the fix/1757-benchmark-stderr-capture branch April 7, 2026 23:14

Mossaka mentioned this pull request Apr 8, 2026

fix: squid_https_latency benchmark always reports 0ms — AWF doesn't capture container stdout #1814

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: separate stderr from stdout in benchmark to prevent invalid JSON#1764

fix: separate stderr from stdout in benchmark to prevent invalid JSON#1764
Mossaka merged 2 commits intomainfrom
fix/1757-benchmark-stderr-capture

Mossaka commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Mossaka Apr 7, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Mossaka left a comment

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Mossaka commented Apr 7, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Coverage Check Passed

Overall Coverage

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Mossaka Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Mossaka left a comment

Choose a reason for hiding this comment

Security Review - PR #1764

Summary

Findings

Pre-existing observations (not introduced by this PR)

Verdict

Uh oh!

github-actions bot commented Apr 7, 2026

Smoke Test Results

Uh oh!

github-actions bot commented Apr 7, 2026

🔥 Smoke Test Results

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Smoke Test: GitHub Actions Services Connectivity ✅

Uh oh!

github-actions bot commented Apr 7, 2026

🏗️ Build Test Suite Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Apr 7, 2026 •

edited

Loading