Skip to content

fix: separate stderr from stdout in benchmark to prevent invalid JSON#1764

Merged
Mossaka merged 2 commits intomainfrom
fix/1757-benchmark-stderr-capture
Apr 7, 2026
Merged

fix: separate stderr from stdout in benchmark to prevent invalid JSON#1764
Mossaka merged 2 commits intomainfrom
fix/1757-benchmark-stderr-capture

Conversation

@Mossaka
Copy link
Copy Markdown
Collaborator

@Mossaka Mossaka commented Apr 7, 2026

Summary

  • Redirect benchmark stderr to benchmark-progress.log instead of mixing it into benchmark-results.json via 2>&1, which was corrupting the JSON output
  • Add a JSON validation step (jq empty) after the benchmark run to fail fast if the output is not valid JSON
  • Upload benchmark-progress.log alongside results as a CI artifact for debugging
  • Document the stdout/stderr contract in benchmark-performance.ts (stdout = JSON only, stderr = progress)

Closes #1757

Test plan

  • YAML validates (npx js-yaml)
  • npm run build succeeds
  • npm test passes (1311/1311)
  • CI passes on this PR

🤖 Generated with Claude Code

The benchmark run step used `2>&1` which mixed stderr progress logs into
benchmark-results.json, making downstream jq parsing silently fail.
Now stderr goes to benchmark-progress.log, and a JSON validation step
catches any corruption early.

Closes #1757

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 7, 2026 22:10
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 86.14% 86.23% 📈 +0.09%
Statements 86.02% 86.11% 📈 +0.09%
Functions 87.45% 87.45% ➡️ +0.00%
Branches 78.81% 78.86% 📈 +0.05%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/docker-manager.ts 86.3% → 86.7% (+0.37%) 85.9% → 86.2% (+0.36%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the performance benchmark CI workflow so that JSON results written to stdout are not corrupted by progress/diagnostic output written to stderr, addressing issue #1757.

Changes:

  • Redirect benchmark stderr to benchmark-progress.log while keeping stdout as benchmark-results.json.
  • Add a jq empty validation step to fail fast when the results file is not valid JSON.
  • Upload both the JSON results and progress log as CI artifacts; document the stdout/stderr contract in the benchmark script.
Show a summary per file
File Description
scripts/ci/benchmark-performance.ts Documents the required stdout (JSON only) vs stderr (progress/diagnostics) contract relied on by CI.
.github/workflows/performance-monitor.yml Separates stdout/stderr capture for the benchmark, validates JSON output, and uploads the progress log artifact.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 1

Comment on lines 55 to 70
- name: Validate benchmark JSON
run: |
if ! jq empty benchmark-results.json 2>/dev/null; then
echo "ERROR: benchmark-results.json is not valid JSON"
cat benchmark-results.json
exit 1
fi

- name: Upload results
uses: actions/upload-artifact@v4
with:
name: benchmark-results-${{ github.sha }}
path: benchmark-results.json
path: |
benchmark-results.json
benchmark-progress.log
retention-days: 90
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If JSON validation fails, the job exits before the artifact upload step, so benchmark-progress.log/benchmark-results.json won’t be available as CI artifacts when you most need them for debugging. Consider making the upload step run with if: always() (or move the upload before validation) so artifacts are preserved even when the validation step fails.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Moved the upload step before validation and added if: always() so artifacts are preserved even when validation fails.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Collaborator Author

@Mossaka Mossaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Review - PR #1764

Reviewed the full diff and the complete performance-monitor.yml workflow plus benchmark-performance.ts.

Summary

This PR separates stderr from stdout (2>benchmark-progress.log instead of 2>&1) to prevent benchmark progress messages from corrupting the JSON output. The change is narrow and well-scoped.

Findings

No security issues found in the changes introduced by this PR.

The stderr-to-file redirect and JSON validation step are safe. The new benchmark-progress.log artifact contains only console.error output from the benchmark script (iteration counters, progress messages) -- no secrets or sensitive data.

Pre-existing observations (not introduced by this PR)

These are informational notes about the existing workflow, not blockers for this PR:

  • [Info] Actions pinned to major version, not SHA: actions/checkout@v4, actions/setup-node@v4, actions/upload-artifact@v4, and actions/github-script@v7 are pinned to major versions rather than commit SHAs. This is standard practice for first-party GitHub actions and acceptable here, though SHA pinning would be stronger for supply-chain hardening.

  • [Info] github.workspace in heredoc: Line 43 uses ${{ github.workspace }} inside a heredoc written to /usr/local/bin/awf. This is safe because github.workspace is a runner-controlled path (e.g., /home/runner/work/repo/repo) that cannot be influenced by external actors. The heredoc delimiter is quoted (<<'WRAPPER'), preventing shell expansion of the content -- only the GitHub expression is interpolated by Actions before the shell runs.

  • [Info] || true on benchmark step: The || true means benchmark failures won't fail the step. This is intentional (regressions are handled by the separate "Check for regressions" step and issue creation), but worth noting that a completely broken benchmark silently produces invalid JSON. The new JSON validation step in this PR actually mitigates this by explicitly failing if the output is not valid JSON -- a nice improvement.

  • [Info] Benchmark data in issue body: The actions/github-script step at line 93 reads benchmark-results.json and interpolates it into an issue body. Since the JSON is generated by the benchmark script (not from external input) and the workflow only triggers on schedule and workflow_dispatch (not on PRs from forks), there is no injection vector here.

  • [Info] Workflow permissions are properly scoped: contents: read and issues: write -- minimal permissions for what the workflow does. Good.

Verdict

Clean. The changes are safe. The stderr separation and JSON validation are both improvements.


Security Review Agent

Address Copilot review: ensure artifacts are uploaded even when
validation fails, so logs are available for debugging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Smoke Test Results

GitHub MCP: "test: add CLI proxy sidecar integration tests" / "chore: upgrade workflows to gh-aw-actions v0.67.2"
Playwright: github.com title contains "GitHub"
File Write: /tmp/gh-aw/agent/smoke-test-claude-24107778191.txt created
Bash: File verified via cat

Overall: PASS

💥 [THE END] — Illustrated by Smoke Claude

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

🔥 Smoke Test Results

Test Status
GitHub MCP (PR: "test: add CLI proxy sidecar integration tests")
GitHub.com connectivity
File write/read (/tmp/gh-aw/agent/smoke-test-copilot-24107778144.txt)

Overall: PASS — PR by @Mossaka

fix: separate stderr from stdout in benchmark to prevent invalid JSON

📰 BREAKING: Report filed by Smoke Copilot

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Smoke test (Codex) results
PR titles:

  1. GitHub MCP merged PR review ✅
  2. safeinputs-gh PR query ❌
  3. Playwright GitHub title check ❌
  4. Tavily web search ❌
  5. File write + cat ✅
  6. Discussion query/comment ❌
  7. npm ci && npm run build ✅
    Overall status: FAIL

🔮 The oracle has spoken through Smoke Codex

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Smoke Test: GitHub Actions Services Connectivity ✅

All checks passed:

Check Result
Redis PING (host.docker.internal:6379) PONG
PostgreSQL ready (host.docker.internal:5432) accepting connections
PostgreSQL SELECT 1 (smoketest db, user postgres) ✅ Returns 1

Note: redis-cli was not available; Redis was tested via raw TCP (nc) with the Redis protocol — response was +PONG.

🔌 Service connectivity validated by Smoke Services

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx All passed ✅ PASS
Node.js execa All passed ✅ PASS
Node.js p-limit All passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #1764 · ● 596.3K ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[plan] Fix benchmark stdout/stderr capture in performance-monitor.yml

2 participants