Benchmarks Website Version 3 by connortsui20 · Pull Request #7643 · vortex-data/vortex

connortsui20 · 2026-04-26T18:45:00Z

Summary

Rewrites the benchmarks website (again).

Design

Instead of a single data.json.gz file that we CAS from the benchmarks, this is a full server binary that manages a duckdb database and allows POST /api/ingest from each of the benchmarks via an emitter. The website itself is then SSR with hydration. I believe that this is the design that we actually want the website to be in, as it is much more maintainable and extensible than previous iterations.

Mostly llm-engineered but with a lot of manual direction:

Single Rust server binary with axum (HTTP) + maud (server-rendered HTML) + DuckDB (embedded analytical DB) + Chart.js. All static assets include_bytes! into the binary.
DuckDB database with one table per benchmark fact (5 fact tables total: compression time, query measurement, vector search, RAG, random access). Backup is just a copy of the file (maybe we can just use the WAL?).
Data ingestion is via POST /api/ingest which accepts versioned JSON envelopes, bearer-token gated. CI pushes results.
MIGRATION (finally): a one-shot migrator ports v2 history forward. I've verified that it (mostly) works, there are a few bugs but I will fix them. We should decide what data we should keep and what we should get rid of.
The migrator runs every v2 record through a classifier that either routes it into one of the 5 fact tables or explicitly skips it with a typed reason (legacy random-access shape, historical memory metric, etc.).
Charts and groups identified by <prefix>.<base64url(serde_json(ChartKey|GroupKey))>. Round-trips through the URL with no DB lookup.
Three HTML routes: / (landing), /chart/{slug} and /group/{slug} (permalinks). One JSON route: GET /api/chart/{slug}.
Deploy is one server binary, one DuckDB file, one INGEST_BEARER_TOKEN env var. SSR means no frontend build step; the only client-side JS is the single chart-init.js.

UI/UX (TBD, the new relational database backend gives us a lot more options now so this could be better):

The landing page is a list of collapsible <details> per group, ordered to match v2. The first group opens by default with its chart data inlined for fast first paint; the rest lazy-fetch via the JSON API the first time they're expanded.
Charts hydrate from inline <script id="chart-data-N"> JSON paired with <canvas data-chart-index="N">. An IntersectionObserver only constructs the Chart.js instance once the canvas scrolls into view.
Each chart should own its own toolbar (scope / Y-axis / absolute vs % of baseline). Scope is zoom, not refetch: each chart pulls a generous slice of commits once, and the buttons + slider just adjust the visible range via chart.update("none"). Mouse wheel pans through history.
URL state (?n=&y=&mode=&hidden=) is honored only on the permalink pages. The landing page always opens at defaults; if you want to share a specific view, share the chart permalink.

Still some work to do, will update this design list later.

Testing

Snapshot testing with insta and seeded by hitting the ingest endpoint that is in-process.

codspeed-hq · 2026-04-26T18:48:45Z

Merging this PR will degrade performance by 17.49%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 2 improved benchmarks
❌ 2 regressed benchmarks
✅ 1194 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	WallTime	`datetimeparts[10M_ms]`	728.6 µs	827.2 µs	-11.93%
⚡	WallTime	`runend[10M_i32_runlen_100000]`	130 µs	92.2 µs	+41.11%
⚡	WallTime	`10M_50%[5000000]`	192.3 µs	152.5 µs	+26.09%
❌	Simulation	`bitwise_not_vortex_buffer_mut[128]`	275.3 ns	333.6 ns	-17.49%

_{Comparing ct/benchmarks-v3 (10b07e8) with develop (deb7de0)}

Signed-off-by: Connor Tsui <[email protected]>

…pt (#7638) ## Summary Implements the alpha **emitter** component for `bench.vortex.dev` v3, per [`benchmarks-website/planning/components/emitter.md`](https://github.com/vortex-data/vortex/blob/ct/benchmarks-v3/benchmarks-website/planning/components/emitter.md). **Purely additive** to v2's emission path — the existing `-d gh-json -o ...` form is untouched. ### Rust emitter (`vortex-bench`) - New `vortex-bench/src/v3.rs` module with one record type per `kind` (`query_measurement`, `compression_time`, `compression_size`, `random_access_time`, `vector_search_run`) plus serde-tagged `V3Record` enum. Field shapes match [`02-contracts.md`](https://github.com/vortex-data/vortex/blob/ct/benchmarks-v3/benchmarks-website/planning/02-contracts.md); dataset/variant/scale-factor mapping follows [`benchmark-mapping.md`](https://github.com/vortex-data/vortex/blob/ct/benchmarks-v3/benchmarks-website/planning/benchmark-mapping.md). - Each benchmark binary gains a `--gh-json-v3 <PATH>` flag that writes bare records as JSONL (no envelope), alongside the legacy `--display-format gh-json -o ...` flow: - `compress-bench` — `compression_time` (encode/decode) + `compression_size`. Cross-format ratios are **not** emitted; ratios are computed read-side per `decisions.md`. - `datafusion-bench`, `duckdb-bench`, `lance-bench` — `query_measurement`, with optional memory fields populated when `--track-memory` is on. `QueryMeasurement` and the paired `MemoryMeasurement` collapse into one record (`SqlBenchmarkRunner::v3_records`). - `random-access-bench` — `random_access_time`, with the dataset name plumbed alongside `TimingMeasurement`. - `vector-search-bench` — `vector_search_run`, with `dataset`, `layout`, `threshold`, `iterations` plumbed in (they don't live on `ScanTiming`). - `insta` snapshot tests cover one record per `kind`, scrubbing `commit_sha` and `env_triple`. ### Post-ingest script `scripts/post-ingest.py` (Python 3, stdlib only — `urllib`, `json`, `subprocess`): - reads JSONL of records, - fills the `commit` envelope from `git show` for the SHA passed in, - wraps in `{run_meta, commit, records}` per the contract, - POSTs to `<server>/api/ingest` with `Authorization: Bearer ...` from `INGEST_BEARER_TOKEN`, - exits non-zero on 4xx/5xx. **No retries, no spool, no S3 outbox** — deferred per the alpha plan. ### Out of scope (deferred) CI workflow integration, dual-write, `bench-orchestrator` updates, retry/spool/outbox, replacing the v2 CLI form. All listed in [`deferred.md`](https://github.com/vortex-data/vortex/blob/ct/benchmarks-v3/benchmarks-website/planning/deferred.md). ## Test plan - [x] `cargo test -p vortex-bench --lib` — 48 passed (7 new `v3` tests, one snapshot per kind plus a JSONL round-trip). - [x] `cargo build -p vortex-bench -p compress-bench -p datafusion-bench -p duckdb-bench -p lance-bench -p random-access-bench -p vector-search-bench` — all clean. - [x] `cargo clippy --all-targets` on changed crates (skipping `duckdb-bench`, blocked by an unrelated pre-existing `cognitive_complexity` lint in `vortex-duckdb` on `ct/benchmarks-v3`). - [x] `cargo +nightly fmt --all`. - [x] End-to-end smoke: `scripts/post-ingest.py` against a Python `http.server` mock — 200 → exit 0 with `{"inserted":1,"updated":0}`; 400 → exit 1 with the server body on stderr. - [ ] Real round-trip against an actual alpha server — blocked on the server component landing (acceptance criterion 3 in the emitter plan; verifiable once the server PR exists). https://claude.ai/code/session_017qh4ju4FtkizW6s67JEhPW --- _Generated by [Claude Code](https://claude.ai/code/session_017qh4ju4FtkizW6s67JEhPW)_ --------- Signed-off-by: Claude <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: Connor Tsui <[email protected]>

…7637) ## Summary Implements the alpha server for `bench.vortex.dev` v3 per [`benchmarks-website/planning/components/server.md`](../tree/ct/benchmarks-v3/benchmarks-website/planning/components/server.md). A single Rust binary that owns a DuckDB file on local disk, accepts authenticated `/api/ingest` POSTs, and serves a small read API plus a placeholder HTML route the web-ui PR will replace. - **Schema** (`src/schema.rs`): `commits` dim + the five fact tables from `01-schema.md`. DDL is applied on boot; no migration framework at alpha. - **Ingest** (`src/ingest.rs`): bearer-auth middleware, all-or-nothing transactions, idempotent upsert via per-table xxhash64 `measurement_id`, full HTTP matrix from `02-contracts.md` (200 / 400 / 401 / 409 / 500). - **Read API** (`src/api.rs`): `/api/groups`, `/api/chart/:slug`, `/health`. Slugs are opaque base64url-encoded JSON (`src/slug.rs`) so the web-ui treats them as strings per the contract. - **Records** (`src/records.rs`): per-`kind` discriminated union with `deny_unknown_fields`, so unknown `kind`s and unknown fields fail loudly. - **HTML** (`src/html.rs`): placeholder root route - replaced by web-ui. ## Stack Pinned in `benchmarks-website/server/Cargo.toml`: - `axum = "=0.7.9"` (`http1`, `json`, `tokio`, `query`) - `maud = "=0.26.0"` with `axum` - `duckdb = "=1.4.1"` with `bundled` - `tower-http = "=0.6.8"` for tracing - `subtle = "=2.6.1"` for constant-time bearer compare - `twox-hash = "=2.1.0"` for the `measurement_id` xxhash64 - workspace `anyhow` + `thiserror` for errors The crate is a leaf binary outside the `vortex-*` public-API surface, so `./scripts/public-api.sh` is intentionally skipped per the task brief. ## Routes | Method | Path | Auth | |---|---|---| | `POST` | `/api/ingest` | bearer | | `GET` | `/api/groups` | none | | `GET` | `/api/chart/:slug` | none | | `GET` | `/health` | none | | `GET` | `/` | none (placeholder, web-ui replaces) | ## Test plan - [x] `cargo build -p vortex-bench-server` - [x] `cargo test -p vortex-bench-server` - 14 tests pass (4 unit + 10 integration) - [x] `cargo clippy -p vortex-bench-server --all-targets -- -D warnings` - [x] `cargo fmt -p vortex-bench-server` - [x] Manual `cargo run` smoke: `/health`, `POST /api/ingest` (with and without bearer), `/api/groups`, `/api/chart/:slug` round-trip. Acceptance criteria from `components/server.md`: - [x] `cargo build` succeeds for the server crate. - [x] Integration test: POST with valid bearer → 200; re-POST → 200 with `updated > 0, inserted = 0`; no/wrong bearer → 401; unknown `kind` → 400. - [x] `GET /health` returns coherent shape after an ingest (db_path, schema_version, latest_commit_timestamp, per-table row counts). - [x] `cargo run` against a fresh DuckDB file serves both read routes. ## Coordination The skeleton commit (`3266b87`) was pushed before the integration test commit so the web-ui agent can rebase onto the workspace member without waiting for tests. Branch: `claude/benchmarks-v3-server` → `ct/benchmarks-v3` (not develop, not main). --- _Generated by [Claude Code](https://claude.ai/code/session_01MPMnGUzXCUQvdkwbhSU9HR)_ --------- Signed-off-by: Claude <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: Connor Tsui <[email protected]>

Signed-off-by: Connor Tsui <[email protected]>

Adds --gh-json-v3 plumbing through vx-bench and post-ingest steps in bench.yml, sql-benchmarks.yml, plus a v3-commit-metadata workflow. All v3 ingest is gated on vars.V3_INGEST_URL and continue-on-error, so it's a clean no-op until the deploy track sets the variable. v2's cat-s3.sh path is unchanged. Signed-off-by: Connor Tsui <[email protected]>

## Summary Implements the alpha web UI for `bench.vortex.dev` v3 per [`benchmarks-website/planning/components/web-ui.md`](../tree/claude/vortex-benchmarks-ui-v3-QxRCK/benchmarks-website/planning/components/web-ui.md). Replaces the placeholder `html.rs` router introduced in #7637 with two real pages backed by Maud templates and a vendored Chart.js bundle. - `GET /` — landing page that lists every group + chart link from `/api/groups`, rendered via `maud`. - `GET /chart/{slug}` — single Chart.js line chart. Payload is fetched server-side via the same `api::collect_chart` helper used by `/api/chart/:slug`, then embedded inline as a JSON `<script id="chart-data">` block. No client-side round-trip after page load. - `GET /static/...` — vendored `chart.umd.js` (Chart.js 4.4.4, MIT), `chart-init.js`, and `style.css`. All bundled into the binary via `include_bytes!`. Slugs are treated as opaque per [`02-contracts.md`](../tree/claude/vortex-benchmarks-ui-v3-QxRCK/benchmarks-website/planning/02-contracts.md): the chart handler echoes whatever `/api/groups` returned straight into `ChartKey::from_slug` without parsing or constructing them itself. `api::collect_groups` and `api::collect_chart` are now `pub(crate)` so the HTML handlers reuse the same row collectors that back the JSON read routes — no second SQL implementation. The chart-init script and the embedded JSON payload between them satisfy the "no network round-trip after page load" criterion. Inside the JSON `<script>` block, `</`, `<!--`, and `<script` are escaped via JSON-safe string escapes so that benign payload contents can never break out of the script element. ## Tests `tests/web_ui.rs` (new, 6 tests): - `landing_page_snapshot` — `insta` snapshot of `GET /` after seeding three envelopes with distinct `commit.sha` / `commit.timestamp` values. - `chart_page_snapshot` — `insta` snapshot of the rendered tpch-Q1 chart page; exercises multi-series rendering (`datafusion:vortex-file-compressed` + `duckdb:parquet`) and verifies both the inline `<script id="chart-data">` block and the `/static/chart.umd.js` reference. - `chart_page_round_trips_every_slug` — every slug returned by `/api/groups` resolves to a 200 chart page with inline data. - `unknown_slug_renders_404` — bogus slug → 404 HTML page. - `empty_landing_page_renders` — empty DB → "No data ingested yet." - `static_assets_are_served` — content-type checks for the three `/static/*` files. Pre-existing `tests/ingest.rs` still passes (10 tests). ## Stack inheritance Inherits the version pins set by #7637 in `benchmarks-website/server/Cargo.toml`. The only Cargo change is `insta = { workspace = true }` under `[dev-dependencies]`. ## Verified locally - `cargo build -p vortex-bench-server` - `cargo test -p vortex-bench-server` — 10 ingest + 6 web-ui tests pass. - `cargo +nightly fmt -p vortex-bench-server -- --check` — clean. - `cargo clippy -p vortex-bench-server --all-targets` — clean. - End-to-end smoke test against a running server: `INGEST_BEARER_TOKEN=test` + `cargo run`, POST two envelopes with different commit shas, verified `/`, `/chart/{slug}`, the three `/static/*` routes, and the invalid-slug 404 path with `curl`. ## Test plan - [ ] Reviewer runs `cargo test -p vortex-bench-server` locally. - [ ] Reviewer starts the server (`INGEST_BEARER_TOKEN=test cargo run -p vortex-bench-server`), POSTs `benchmarks-website/server/fixtures/envelope.json`, and visits `http://127.0.0.1:3000/` in a real browser to confirm the chart hydrates (this PR was developed in a headless sandbox so visual verification was not possible here). - [ ] CI green. ## Out of scope (deferred per `web-ui.md` + `deferred.md`) Per-commit page, filter UI, full-screen modal, deep links, LTTB downsampling, lookup-table-driven engine names / colours, chartjs-plugin-zoom, ratio rendering on compression-size charts, and geomean summary cards are explicitly deferred and not touched here. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- _Generated by [Claude Code](https://claude.ai/code/session_01UjgnLq5MCmcpyv6PXC5oLv)_ --------- Signed-off-by: Claude <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: Connor Tsui <[email protected]>

… hash (#7642) Without commit_sha in the hash input, every (dim tuple) collapses to one row across commits via INSERT ... ON CONFLICT DO UPDATE, so the chart pages render at most one point per series. Adding commit_sha to the per-table hashers makes each (commit, dim) pair its own row, which is the time series the UI is built around. Re-emission of the same (commit, dim) is still the upsert case. The web-ui chart_page_query snapshot now correctly shows three commits with three points per series, matching the test fixture. No public API change; measurement_id is server-internal.  ## Summary  Closes: #000  ## Testing  Signed-off-by: Claude <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: Connor Tsui <[email protected]>

Signed-off-by: Connor Tsui <[email protected]>

This PR introduces the deployment infrastructure for vortex-bench-server v3, a new benchmarking server that runs alongside the existing v2 instance. The v3 server provides an ingest endpoint for benchmark results with bearer token authentication and uses DuckDB for data storage. 1. **GitHub Actions workflow** (`publish-bench-server.yml`): New CI pipeline that builds and publishes the vortex-bench-server Docker image to GHCR on changes to the server code, vortex-bench crate, or Cargo.lock. 2. **Dockerfile** (`benchmarks-website/server/Dockerfile`): Multi-stage Docker build that: - Compiles vortex-bench-server in a Rust 1.91 environment - Packages it with DuckDB CLI tools in a minimal Debian image - Targets ARM64 architecture for EC2 deployment 3. **Backup script** (`benchmarks-website/server/scripts/backup.sh`): Daily backup utility that: - Exports the DuckDB database from the running container - Uploads backups to S3 (`vortex-ci-benchmark-results/v3-backups/`) - Manages local disk space by retaining only the latest backup 4. **Docker Compose configuration**: Added vortex-bench-server service that: - Runs on port 3001 (v2 remains on port 80) - Mounts EBS-backed data directory for DuckDB persistence - Loads bearer token from `/etc/vortex-bench/secrets.env` - Integrates with existing watchtower for automatic image updates 5. **EC2 initialization guide** (`ec2-init.txt`): Comprehensive setup documentation covering: - Bearer token secret management - EBS volume preparation - Service startup and health checks - Cron-based backup scheduling - Token rotation procedures The v3 server is designed to run additively alongside v2, allowing for gradual DNS migration and dual-write support from CI. The Docker image build is validated by the GitHub Actions workflow on each push to develop. The backup script can be tested manually on the EC2 host before cron scheduling. Smoke tests are documented in the setup guide (curl against `/health` endpoint on port 3001). https://claude.ai/code/session_019mBcBdF4LhKDXyKwuKRAPV --------- Signed-off-by: Claude <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: Connor Tsui <[email protected]>

This is a one-shot migration binary to take all of the data from `data.json.gz` and bring it into a duckdb database. Simply gathers and aggregates everything into memory and writes data in chunks with arrow arrays. Insert row-by-row took way too long, and the appender API in duckdb does not support `BIGINT[]` for some reason... --------- Signed-off-by: Claude <[email protected]> Signed-off-by: Connor Tsui <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: Connor Tsui <[email protected]>

Signed-off-by: Connor Tsui <[email protected]>

Six small fixes left over from the v3 migration alpha. All paths relative to `benchmarks-website/migrate/` unless noted. ## Fixes - **Scale-factor canonicalization** (`src/classifier.rs::bin_compression_size`, `src/migrate.rs::migrate_file_sizes`, helper in `src/v2.rs`): both paths now route the v2 SF string through `canonical_scale_factor`, which parses to `f64` and formats with no trailing zeros. Without this, `"1"` vs `"1.0"` and `"10"` vs `"10.0"` would produce different `dataset_variant` strings and prevent the data.json.gz and file-sizes-*.json.gz rows from sharing a `measurement_id`. - **Summary counter timing** (`src/migrate.rs::run`): per-fact counters used to be set from accumulator length *before* the flush, so a flush failure would print a summary that lied. Refactored into a `flush_all` helper that bumps `summary.<fact>_inserted` from the flushed `RecordBatch::num_rows()` only after each `Appender::append_record_batch` succeeds. - **Empty-string normalization in commits** (`src/commits.rs`, `benchmarks-website/server/src/schema.rs`, `benchmarks-website/server/src/api.rs`): `message`, `author_name`/`email`, `committer_name`/`email` now bind as `Option<String>` and store SQL `NULL` when v2 supplied an empty or whitespace-only string. Schema columns made nullable; server reads use `COALESCE(c.message, '')` so the existing `String` decoder still works. - **Orphan WAL cleanup** (`src/migrate.rs::open_target_db`): the existing code already attempts `remove_if_exists` on the `.wal` regardless of whether the main file was present; pinned the behavior with a regression test that stages an orphan `.wal` (no main file) and asserts the orphan bytes don't survive `open_target_db`. - **Random-access dataset extraction** (`src/classifier.rs::bin_random_access`): 4-part records `random-access/<dataset>/<pattern>/<format>-tokio-local-disk` continue to extract `dataset/pattern` from the raw name. 2-part legacy records carry no dataset and used to render under the placeholder `"random access"`; they're now dropped to keep the v3 dataset column meaningful. - **`migrate_file_sizes` dataset fallback** (`src/migrate.rs::migrate_file_sizes`): when the matrix id stripped from `file-sizes-<id>.json.gz` isn't on the `KNOWN_FILE_SIZES_SUITES` allowlist, the fallback now emits `unknown:<id>` so the UI clearly flags it instead of presenting it as a real dataset. ## Tests Each fix has a focused regression test (`rstest` parametrization where useful): - `tests/classifier.rs::compression_size_scale_factor_canonicalizes` covering `"1"`, `"1.0"`, `"10"`, `"10.0"`, `"0.1"`, whitespace, and `""`. - `tests/classifier.rs::unmapped_records_yield_none` extended with `random_access_2_part_legacy` and `random_access_3_part`. - `migrate::tests::flush_all_does_not_overcount_on_failure` (private unit test that drops `compression_times` to force the second flush to fail and asserts only the queries counter is set). - `tests/end_to_end.rs::summary_counts_match_actual_rows_on_success` (sister invariant for the success path). - `tests/end_to_end.rs::empty_author_email_stored_as_null`. - `tests/end_to_end.rs::open_target_db_removes_orphan_wal`. - `tests/end_to_end.rs::file_sizes_unknown_id_falls_back_to_unknown_prefix` and `file_sizes_known_id_uses_id_directly`. - `tests/end_to_end.rs::compression_size_data_and_file_sizes_merge_with_canonical_sf` (cross-path SF canonicalization end to end). ## Verification - `cargo build -p vortex-bench-migrate` — clean. - `cargo test -p vortex-bench-migrate` — 7 unit + 46 classifier + 12 end-to-end tests all pass. - `cargo test -p vortex-bench-server` — 6 unit + 10 ingest + 6 web_ui tests pass; schema and `COALESCE` changes are server-safe. - `cargo clippy -p vortex-bench-migrate --all-targets` — clean. - `cargo fmt` on changed files (nightly fmt unavailable in this sandbox; ran with stable, which is a no-op for the imports-granularity options the repo's `rustfmt.toml` gates on nightly). - Skipped `./scripts/public-api.sh`: migrate is a leaf binary outside the public-api lockfile set, and the only newly `pub` item is the internal `canonical_scale_factor` helper. Signed-off-by: Claude <[email protected]> --- _Generated by [Claude Code](https://claude.ai/code/session_012XyYJRpcGFxmJXdTJuW8Ff)_ --------- Signed-off-by: Claude <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: Connor Tsui <[email protected]>

…7681) ## Summary Brings the v3 benchmarks website to a demo-ready state focused on the historical-comparison use case (Vortex vs other engines on the same commit, HEAD vs N commits ago, latest vs first as % delta). Single process, single binary; SSR `maud` + inline JSON `<script>` + Chart.js — no client-side framework, no build step, no post-load API round-trips. > Branch note: this PR was developed on the harness-assigned branch > `claude/demo-ready-benchmarks-v3-H5ECI` rather than the > `claude/benchmarks-v3-ui-historical-comparison` branch the task > request mentioned, because the session's harness pins the working > branch (`Develop on branch …`, `NEVER push to a different branch > without explicit permission`). ## CI note The `Rust tests (windows-x64)` job is failing on this PR but the **same job is also failing on the merge commit at the tip of `ct/benchmarks-v3`** (PR #7671's run, job id `73229326105`, the commit `8697731` we branched from). The base branch shipped with that failure tolerated, and our diff only touches `benchmarks-website/server/` (no Windows-specific paths, no FFI, no new dependencies on Windows-fragile crates), so this failure is pre-existing and not caused by the PR. CodSpeed flagged two `varbinview_zip` regressions in `vortex-array/` — also untouched by this PR. ## What's new * **Scoped commit window** — `?n=25|50|100|250|all`, default 100, server-side clamp to `[1, 1000]`. SQL splices in a `LIMIT ?` filter and binds the value as a parameter (consistent with the rest of the file's `params!`-style use); the unbounded path is a separate query so the plan stays clean. * **Group page** — `GET /group/{slug}` renders every chart in one group on a single screen. Each card embeds its own `<script id="chart-data-N">` payload + sibling `<canvas data-chart-index="N">`. `IntersectionObserver` defers `Chart` construction until the canvas scrolls into view (mobile-friendly + cheap for 22-chart TPC-H groups). * **Toolbar** — same component on `/chart/{slug}` and `/group/{slug}`. Scope buttons + slider, linear/log Y-axis, absolute / `% of baseline` mode. URL query string is canonical state; subtitle mirrors active state. Slider step is `5` so it can land on every preset value (`25`, `50`, `100`, `250`). * **Rich tooltip** — custom external HTML tooltip with `<short-sha> · YYYY-MM-DD` title; per-series rows render value with friendly unit (ns→µs→ms→s, B→KiB→MiB→GiB) and a coloured `% delta` vs the prior visible commit; footer carries the truncated commit message + a GitHub link. Document-level click closes. * **Legend → URL** — clicking a legend item rewrites `?hidden=engine:format|…` via `history.replaceState` (no back-button hostility). Permalinks reproduce the view. Delimiter is `|` so series names can contain `:` and `,` without escaping. * **Mobile** — `@media (max-width: 768px)`: single-column chart grid, toolbar wraps with ≥ 40 px touch targets, slider expands to fill the row, legend pops to the *top* of the chart so it doesn't push the chart off-screen on a phone. * **Landing search** — client-side filter input above the group list. * **/api/group/{slug}** — JSON sibling to the HTML route, returns every chart in the group with payloads inlined. ## What was *not* picked up from `planning/components/web-ui.md`'s deferred list Done now (moved out of deferred): - mobile redesign basics (single column, ≥ 40 px tap targets, toolbar wrap) - engine + series toggling (legend ↔ URL) - deep-link state (every toolbar control is URL-canonical) - group landing with the start of "filters" (client-side search) Still deferred (intentional): - per-commit drill-down page - ad-hoc SQL page - LTTB downsampling - engine name lookup table + curated colour palettes - summary cards (geomean ratios, rankings) - full-screen modal / zoom-pan - `?mode=delta` (compare-to-main) — parser branch dropped pending data shape work; toolbar surface today is only `abs / rel` ## Repro INGEST_BEARER_TOKEN=$(openssl rand -hex 32) \ VORTEX_BENCH_DB=./bench.duckdb \ cargo run --release -p vortex-bench-server Then open `http://localhost:3000/`, click any group name (now a link to `/group/{slug}`), or any chart inside, and play with the toolbar. Toggle a series in the legend and notice `?hidden=…` appear in the URL. Resize to phone width to confirm single-column layout, sticky toolbar wrapping, and legend-on-top. ## Snapshot diffs Three `.snap` files refreshed by this PR: - `landing_page.snap` — group names now link to `/group/{slug}`, search input added, `data-group-name` for client filter. - `chart_page_query.snap` — toolbar + indexed `<script id="chart-data-0">` + tooltip host element. - `group_page_query.snap` (new) — group page rendered against the fixture DB, `?n=100` pinned for stability. Run `INSTA_UPDATE=always cargo test -p vortex-bench-server` (or `cargo insta accept`) to refresh. ## Test plan - [x] `cargo build -p vortex-bench-server` - [x] `cargo test -p vortex-bench-server` — 41 tests pass (22 unit + 10 ingest + 9 web_ui) - [x] `cargo clippy -p vortex-bench-server --all-targets -- -D warnings` — clean - [x] `cargo +nightly fmt` — no diff - [ ] `./scripts/public-api.sh` — skipped per CLAUDE.md (leaf binary, not in workspace public-api lockfile set) - [ ] Manual screenshots — couldn't capture from the sandbox; the reviewer or follow-up should record landing / single chart with toolbar / group desktop / group mobile / tooltip open / log+rel. ## Follow-up review fixes (commits `7042f0d` … `da668a4`) - `7042f0d` — `LIMIT` value travels as a bound parameter (`LIMIT ?`) via `params_from_iter` instead of being interpolated into SQL. - `9c80bce` — drop the unused `?mode=delta` parser branch in both `UiQuery::mode` and `chart-init.js::parseUrl`. - `d156ab8` — `?hidden=` delimiter is now `|`; new test pins the server/client wire agreement. - `da668a4` — slider `step` lowered to 5 so it can land on every preset (`25/50/100/250`). ## Things explicitly NOT changed - `/api/ingest`, auth, schema, write paths. - DB migration (none added). - Existing routes (no renames). - v2 site at `benchmarks-website/server.js` etc — untouched. - Single-chart page still works; reuses the same `chart-init.js`. https://claude.ai/code/session_015Nc73ihs9TUdx7QzLUZudK --------- Signed-off-by: Claude <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: Connor Tsui <[email protected]>

Removes the click-through landing page and the full-page reload that gated every toolbar interaction in the v3 site. Landing page (`/`) now renders every chart inline using the same `chart-card` markup as `/group/{slug}`, with a smaller default commit window (50, vs 100 on the per-chart routes) so the cold payload stays cheap. The existing `IntersectionObserver` lazy-construct path means offscreen charts don't pay the Chart.js cost up front. Toolbar (Scope / Y / Mode) now updates in place: - Scope: `fetch('/api/chart/{slug}?n=...)` per card, swap `chart.data`, `chart.update("none")`. Per-card "loading…" + error overlays. - Y axis: client-side `chart.options.scales.y.type` swap, no fetch. - Mode (abs/rel): client-side `buildDatasets` recompute, no fetch (server already returns absolute values; the rel transform was already client-side). URL stays in sync via `history.replaceState` so deep links keep working, and the existing permalinks (`/chart/{slug}`, `/group/{slug}`) are untouched for SEO and sharing. `api::collect_chart` is preserved as a thin wrapper around the new `api::chart_payload` helper; chart-card markup grows `data-chart-slug` + `data-permalink` attributes that the toolbar reads when refetching. Tests: snapshots refreshed where markup intentionally changed; new tests cover `GET /api/chart/{slug}` JSON shape + `?n=` narrowing, plus the landing-page n=50 default. Signed-off-by: Claude <[email protected]> https://claude.ai/code/session_01NhtGnaLstPEAh7cRJ4qDFt

…psible groups Replaces the page-level toolbar (which controlled every chart together) with a per-chart toolbar that the user reported as the main UX complaint, and switches the scope mechanism from "refetch on change" to "zoom over a single fetched slice" so the slider is fluid at 60fps. ## Per-chart toolbar Every `.chart-card` now carries its own compact `.toolbar.toolbar--card` with Show / Y / Mode controls. There is no page-level toolbar on `/`, `/chart`, or `/group`. Toolbar buttons are `<button type="button">` (not `<a>`): they manipulate Chart.js state in place rather than navigating. ## Zoom-as-scope Each chart fetches up to 1000 commits once. The "Show" buttons and slider set `chart.options.scales.x.min/max` to a window of the fetched slice; no refetch on scope change. The slider fires on `input` throttled to 16ms (~60fps, matches v2's `ZOOM_THROTTLE_DELAY`) so dragging is continuous. Drag-pan and drag-rectangle-zoom are wired through `chartjs-plugin-zoom`; mouse wheel pans horizontally via a manual canvas listener calling `chart.pan()` because the plugin doesn't expose pan-on-wheel. The zoom plugin UMD is bundled locally (`static/chartjs-plugin-zoom.umd.min.js`, MIT-licensed). hammerjs is intentionally not bundled — touch gestures are nice-to-have, the plugin's mouse path works without it (guarded by `if (Hammer)`). ## Tooltip flicker fix + crosshair The tooltip host is now permanently `pointer-events: none`. The previous code flipped it to `auto` while visible, which produced a flicker loop: cursor on tooltip → mouseout on canvas → tooltip hides → mousein on canvas → tooltip shows. Cost: tooltip-internal links are no longer clickable; the chart-card title already links to the permalink. The tooltip is offset 12px from the cursor and flips to the left when within 24px of the right edge. Interaction mode is `{ mode: "index", intersect: false, axis: "x" }` so hover anywhere over the chart snaps to the nearest commit. A custom inline plugin (`afterDatasetsDraw`) draws a 1px dashed `--muted` vertical crosshair at the active hover index. ## Collapsible groups, v2 ordering Landing page wraps each group in `<details>` with a `<summary>` that shows the group name + chart-count badge. Only the first group is `open` by default; closed groups render only the chart-card shells (no inline JSON), and `chart-init.js` fetches their payloads via `/api/chart/{slug}?n=1000` on the first `details.toggle` event. Group naming was rewritten to match v2's hard-coded list: - `tpch sf=1 [nvme]` → `TPC-H (NVMe) (SF=1)` - `tpcds sf=10 [nvme]` → `TPC-DS (NVMe) (SF=10)` - `clickbench [nvme]` → `Clickbench` A new `pub const GROUP_ORDER` + `pub fn group_sort_key` in `api.rs` sort discovered groups into the canonical order; unknown groups sort last by alphabetical fallback. Option (1) from the task brief — the rename was a clean change inside `group_name_query` only, no need for the option-(2) sort-key fallback. ## URL state URL writeback for per-chart toolbar state was deliberately dropped. The user's feedback emphasised local-and-immediate UX, not "share a perfect view via URL"; permalinks (`/chart/{slug}`, `/group/{slug}`) are the sharing mechanism. `?n=` on the landing route is still honoured as a power-user override on the initial fetch size. ## Tests - Snapshots refreshed for all three pages (markup change is large). - Added: `landing_groups_render_in_v2_order` — fixture covers Random Access / Compression / Compression Size / TPC-H / vector-search and the rendered order matches the canonical list. - Added: `details_first_group_open_others_closed`. - Added: `chart_card_carries_per_chart_toolbar` (every card). - Updated `static_assets_are_served` to cover the new `/static/chartjs-plugin-zoom.umd.min.js` route. ## Out of scope (per task brief) - Zoom-sync across charts in a group (v2's `zoom-sync.js` pattern) — follow-up PR. - LTTB downsampling. - "Compare to main" delta mode. - The `collect_group_charts` N+1 in `api.rs`. - Mobile legend resize handler. - Replacing the inline crosshair plugin with `chartjs-plugin-crosshair`. Signed-off-by: Claude <[email protected]> https://claude.ai/code/session_01NhtGnaLstPEAh7cRJ4qDFt

Signed-off-by: Connor Tsui <[email protected]>

The previous "fix CI lints" commit accidentally clobbered planning/README.md with planning/AGENTS.md content, leaving the two files byte-for-byte identical. Restore README.md to its intended planning content (status, production-readiness checklist, open product decisions, deferred UI follow-ups, components, branch conventions). AGENTS.md is unchanged. Signed-off-by: Claude <[email protected]>

Signed-off-by: Connor Tsui <[email protected]>

- AGENTS.md: bullet 4 now lists all four JSON routes (groups, chart, group, health), not just chart. - 02-contracts.md: Read API section adds /api/group/:slug and /health, drops the "two routes" framing. - 01-schema.md: relax commits.{message,author_name,author_email, committer_name,committer_email} to optional, matching schema.rs DDL. - README.md: remove "/health endpoint" from the not-yet-done list (it's implemented in api.rs and routed in app.rs), refresh the line range for the collect_group_charts N+1 reference. Signed-off-by: Claude <[email protected]>

…as dataset=taxi The migrator's `bin_random_access` rejected every 2-part v2 name shape `random-access/<format>-tokio-local-disk` as `Skip::UnsupportedShape`, even though `random-access-bench`'s `measurement_name` only emits the 2-part form for the legacy taxi run (no `AccessPattern`) and the live v3 emitter writes those measurements with `dataset="taxi"`. The historical 2-part records on S3 (every random-access timing emitted between 2025-04 and 2026-02 plus the post-2026-02 cached/footer duplicates) were therefore dropped, leaving only the 4-part `taxi/correlated` and `taxi/uniform` history in v3. Recover them under `dataset="taxi"` so the chart matches what the live v3 emitter produces. The reopen-mode `-footer` variant still falls through to `Skip::Deprecated` because its format string doesn't strip clean to a v3-allowlisted name; that mirrors how the live emitter doesn't distinguish reopen vs cached either. Also extend the verifier so future regressions are easy to spot: - `verify.rs` now diffs at the chart-name level (not just chart count) and routes documented intentional asymmetries — derived ratios, empty FAN_OUT_GROUPS placeholders, the `RANDOM ACCESS` placeholder, the recovered `TAXI` chart, the new `VORTEX COMPACT SIZE` chart, the fineweb group — to a separate "intentional" bucket so a real drop shows up as a fresh ✗ regression candidate. - `MigrationSummary` now carries a per-`Skip`-reason histogram and prints it in the run summary, so a regression that pushes records into the wrong bucket is visible at a glance. - The CLI's `verify` exit code reflects `report.is_clean()` (every asymmetry on the documented allowlist) instead of just group-level coverage. Test plan: `cargo test -p vortex-bench-migrate` (66 tests pass), full end-to-end migrate against the production v2 S3 dump, verify against a local v2 server seeded from the same dump (chart-name diff is clean, every asymmetry documented). Signed-off-by: Claude <[email protected]>

Signed-off-by: Connor Tsui <[email protected]>

Three small UX fixes for the v3 inline charts: 1. Sort tooltip rows by current y-value descending via Chart.js `tooltip.itemSort` so they match the visual top-to-bottom stack of series at the hovered x. 2. Drop the bare `github.com/.../commit/...` URL from the tooltip footer. Show `<short-sha> · <first-line message truncated to 80>` instead. The full URL is no longer rendered as text but is still reachable via the click handler below. 3. Add `onClick` on each chart that picks the nearest x-index, parses `(#NNNN)` from the squash-merged commit message, and opens the corresponding `vortex-data/vortex` pull request in a new tab. Falls back to the commit URL when the regex doesn't match (which is only expected for non-squash merges). Pure JS change in `static/chart-init.js`; no Rust/API touched, so no fmt/clippy/public-api work was needed. Snapshot tests already pass — they assert the served HTML, which only references the JS file by URL. Signed-off-by: Claude <[email protected]>

Add a thin draggable strip below each chart's <canvas> that mirrors the fetched commit history and highlights the currently visible window. The highlight can be panned by dragging its body or resized by dragging either edge handle; bare-track clicks recentre the window at the cursor. Wired bidirectionally with chartjs-plugin-zoom: drag-pan and drag-rect-zoom gestures refresh the strip via the plugin's onPan/onZoom/onPanComplete/onZoomComplete hooks, while toolbar slider changes and wheel-pan call canvas.__bench_strip_render directly. Strip drags clamp to the data range, mirror the resulting window size onto the toolbar slider, and trigger chart.update("none"). The strip is ~14px tall (18px on mobile), keeps pointer-events: auto (unlike the tooltip host), and lays out via percentages so it tracks the chart canvas width without extra wiring on resize. Local checks: - cargo test -p vortex-bench-server --test web_ui (15 passed; insta snapshots updated for landing/chart/group pages to include the strip markup; new chart_card_carries_per_chart_toolbar assertions cover range-strip / window / handles). - cargo +nightly fmt --all - cargo clippy -p vortex-bench-server --all-targets --all-features (clean). Browser smoke test: not run in this environment; the snapshot tests exercise the full SSR rendering path against a fixture-seeded DB and confirm one strip per chart-card on /, /chart/{slug}, and /group/{slug}. Signed-off-by: Claude <[email protected]>

Add a sticky filter bar at the top of the landing page with rows of toggle chips for engines (datafusion, duckdb, …) and formats (vortex-file-compressed, parquet, …). Clicking a chip hides every series whose engine or format doesn't match across every chart at once. Per-card legend toggles still work and are tracked as overrides — once you click a series's legend on a card, the global filter no longer touches that series on that card. The chip universe is sourced from a `SELECT DISTINCT` over the fact tables, so adding a new engine or format in ingest grows the bar with no code change. Filter state round-trips through `?engine=…&format=…`. The landing page reads the params on load, the client `history.replaceState`s on every chip click, and the permalink pages (`/chart/{slug}`, `/group/{slug}`) embed the same JSON state so a shared deep link applies the filter on hydration even though they don't render the bar themselves. Wire shape: each `ChartResponse` now carries an optional `series_meta` map keyed by series name with `{engine?, format?}` tags so the client has the metadata it needs to drive bulk hide/show without parsing series labels heuristically. Series without an engine tag (compression times, random access, vector search) are unaffected by the engine filter, and similarly for the format filter — a "duckdb only" toggle shouldn't nuke charts that have no engine dimension. Snapshot tests for the bar markup; smoke tested in a real browser (playwright + chromium): chip clicks hide the right datasets across every chart, URL updates, override survives a global re-toggle, and a refresh restores the filter. Signed-off-by: Claude <[email protected]> ui(benchmarks-website): move filter bar into navbar dropdown; toggle chips independently Drop the standalone filter bar that sat below the header and put the chips inside a "Filters" dropdown anchored to the sticky navbar, so adjusting visibility no longer requires scrolling back to the top of the page. The trigger button shows a small badge counting how many chips are currently off; the panel opens/closes on click and dismisses on click-outside or Escape. Permalink pages render the same dropdown in their navbar (previously they had no UI for it, only honouring URL state). Toggle semantics changed to be per-chip independent. Previously the first chip click in a row pivoted from "all visible" to "only this one"; now each chip flips just its own active state. The "all" chip is a one-shot reset that forces every chip in that row back to active — it never holds an active state itself. Internal model: `globalFilter.{engines,formats}` now tracks the active (visible) set rather than an allowlist that's empty when no filter is applied. The universe is read from the rendered chip DOM so the client doesn't have to mirror the server enums. The URL stays as an allowlist (`?engine=duckdb` = "show only duckdb") for stability, and we omit the param whenever the active set equals the universe so the no-filter URL is clean. Override fix: the legend onClick now flips both `dataset.hidden` and `setDatasetVisibility` so subsequent global filter passes (which write to `dataset.hidden`) don't drift from the legend's overrides. cargo test, clippy, fmt clean. Browser smoke (playwright + chromium): clicking duckdb hides only the duckdb series; "all" restores every chip; legend override on a card sticks across further chip changes; click-outside closes the panel; URL updates as expected. Signed-off-by: Claude <[email protected]>

## Summary Fixes the UI of the benchmarks v3 website. - no longer max of 1000 commits - LTTB dynamic downsampling on the client side - a bunch of other stuff ## Testing More snapshot testing. --------- Signed-off-by: Claude <[email protected]> Signed-off-by: Connor Tsui <[email protected]> Co-authored-by: Claude <[email protected]>

…as dataset=taxi The migrator's `bin_random_access` rejected every 2-part v2 name shape `random-access/<format>-tokio-local-disk` as `Skip::UnsupportedShape`, even though `random-access-bench`'s `measurement_name` only emits the 2-part form for the legacy taxi run (no `AccessPattern`) and the live v3 emitter writes those measurements with `dataset="taxi"`. The historical 2-part records on S3 (every random-access timing emitted between 2025-04 and 2026-02 plus the post-2026-02 cached/footer duplicates) were therefore dropped, leaving only the 4-part `taxi/correlated` and `taxi/uniform` history in v3. Recover them under `dataset="taxi"` so the chart matches what the live v3 emitter produces. The reopen-mode `-footer` variant still falls through to `Skip::Deprecated` because its format string doesn't strip clean to a v3-allowlisted name; that mirrors how the live emitter doesn't distinguish reopen vs cached either. Test plan: cargo test -p vortex-bench-migrate --test classifier (54 pass, 6 of them new and gating the recovery). Signed-off-by: Claude <[email protected]>

No more 300KB uncompressed HTML on the landing page. Layer applied to every response, including /static/* so chart.umd.js gets squashed too. Signed-off-by: Claude <[email protected]>

Saves bytes on the cold landing page; chart-init.js refetches with a wider window when the user actually zooms past the inlined range. Signed-off-by: Claude <[email protected]>

connortsui20 added the changelog/skip Do not list PR in the changelog label Apr 26, 2026

connortsui20 force-pushed the ct/benchmarks-v3 branch 6 times, most recently from 918fea3 to 2d9b05e Compare April 28, 2026 14:49

connortsui20 and others added 22 commits April 29, 2026 17:05

add benchmarks website v3 design overview and plan

93e79c3

Signed-off-by: Connor Tsui <[email protected]>

fix planning docs

18f4f8e

Signed-off-by: Connor Tsui <[email protected]>

add duckdb to gitignore

d1b44f8

Signed-off-by: Connor Tsui <[email protected]>

add .bench-env to gitignore

ccc5919

Signed-off-by: Connor Tsui <[email protected]>

fix CI lints

e80b991

Signed-off-by: Connor Tsui <[email protected]>

improve ui and style

ec0243a

Signed-off-by: Connor Tsui <[email protected]>

add light mode and fix style to be similar to v2

264d39d

Signed-off-by: Connor Tsui <[email protected]>

fix migration skips

ce77566

Signed-off-by: Connor Tsui <[email protected]>

claude and others added 4 commits April 29, 2026 17:05

connortsui20 force-pushed the ct/benchmarks-v3 branch from 6444858 to 7113963 Compare April 29, 2026 21:05

claude added 3 commits April 30, 2026 15:20

perf(benchmarks-website): gzip/brotli responses via tower-http

c52d934

No more 300KB uncompressed HTML on the landing page. Layer applied to every response, including /static/* so chart.umd.js gets squashed too. Signed-off-by: Claude <[email protected]>

perf(benchmarks-website): trim inline JSON to LANDING_INLINE_N=100

10b07e8

Saves bytes on the cold landing page; chart-init.js refetches with a wider window when the user actually zooms past the inlined range. Signed-off-by: Claude <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks Website Version 3#7643

Benchmarks Website Version 3#7643
connortsui20 wants to merge 29 commits intodevelopfrom
ct/benchmarks-v3

connortsui20 commented Apr 26, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Apr 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

connortsui20 commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Testing

Uh oh!

codspeed-hq Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 17.49%

Performance Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

connortsui20 commented Apr 26, 2026 •

edited

Loading

codspeed-hq Bot commented Apr 26, 2026 •

edited

Loading