fix(models): backend and analytics correctness batch for 5.1 GA by heskew · Pull Request #1236 · HarperFast/harper

heskew · 2026-06-10T21:13:04Z

Summary

Five verified correctness fixes to the models subsystem from the 5.1 GA readiness audit — relates to #1235 (PR 2 of 2, companion to #1234).

hdb_model_calls phantom indexes — dropped indexed: true from all nine non-PK attributes. Writes go through primaryStore.put, which bypasses index maintenance, so the declared indexes were permanently empty and any attribute-filtered billing/usage query silently returned zero rows. Matches the hdb_raw_analytics pattern. The attribute list is now an exported const so the schema test asserts structurally.
Bedrock inference-profile model IDs — familyOf() walks the dot-segments until a known family is found, so us.anthropic.claude-… / eu.meta.… / global.… dispatch correctly (previously: family 'us' → unknown → throw, breaking effectively all current Claude-on-Bedrock). amazon.nova-* now throws a descriptive "not yet supported" error instead of being sent a malformed Titan-shaped body.
Abort gate at tool dispatch — runSingleToolCall() checks the composed signal before invoking a handler, so side-effecting tool handlers never start after the caller has aborted (closes the serial-dispatch gap too).
OpenAI max_completion_tokens — sent when baseUrl resolves to native api.openai.com (current reasoning models reject max_tokens); custom baseUrls keep max_tokens since OpenAI-compatible shims may not know the newer param. Detection parses the URL hostname (port-suffix and spoofed-host safe).
Upstream memory caps — readBoundedJson() bounds non-streaming bodies (64 MiB success / 256 KiB error — previously res.json() buffered unbounded and the 500-char error cap only trimmed after parsing), wraps stream errors in the backend error class, and cancels the body on breach. The streaming tool-call accumulator maps are now capped at 128 entries and 8 MiB total argument chars across entries (per-entry 1 MiB cap alone allowed ~128 MiB) in all three streaming parsers.

Where to put attention

The native-OpenAI detection heuristic (fix 4): hostname equality with api.openai.com. Proxies/gateways fronting OpenAI (Cloudflare AI Gateway, Helicone) will get max_tokens and fail against reasoning models — by design for now (no reliable detection); flagged by the cross-model review as the residual case. A per-entry config override would be the escape hatch if users hit it.
The Nova fail-loud (fix 2) is a behavior change from "malformed request, upstream error" to "immediate descriptive error" — strictly better, but it is the one place this PR rejects something it previously attempted.
Cap constants (64 MiB / 256 KiB / 128 entries / 8 MiB) are judgment calls sized generously above real-world responses; sanity-check the numbers.

Tests: 75 passing across the targeted unit files (includes new cases for inference-profile dispatch, Nova rejection, pre-aborted dispatch, native-vs-shim token param, bounded-reader limits, and accumulator caps); broader models+components suites green except one pre-existing EMFILE failure confirmed on main.

Cross-model review: Gemini findings adjudicated and applied in the second commit (structural test, URL-parsed detection, bounded-reader hardening, total-arg cap). Codex leg unavailable (weekly session limit). Generated by an LLM (Claude, Fable 5).

🤖 Generated with Claude Code

Five verified bugs found in the models backend: 1. hdb_model_calls phantom indexes: dropped `indexed: true` from all non-PK attributes in analyticsTable.ts. flush() and cleanup() write via primaryStore.put/remove which bypass updateIndices, so secondary indexes would stay permanently empty. Matches hdb_raw_analytics pattern. 2. Bedrock inference-profile model IDs: familyOf() now walks all dot- separated segments to find the first known family, resolving cross- region IDs like `us.anthropic.claude-3-5-sonnet-…` that previously resolved to family 'unknown'. Also added a descriptive error for amazon.nova-* models before they hit the wrong (Titan) body shape. 3. Abort gate at tool dispatch: added ctx.signal?.throwIfAborted() at the top of runSingleToolCall so pre-aborted signals are caught before any side-effecting handler starts. The serial path between multiple handlers also benefits since each handler entry now checks the signal. 4. OpenAI max_tokens vs max_completion_tokens: OpenAI's reasoning/gpt-5 models reject `max_tokens` with a 400. Send `max_completion_tokens` when talking to api.openai.com; keep `max_tokens` for any custom baseUrl to preserve compatibility with vLLM, Ollama-compat, and other OpenAI-compatible shims. 5. Upstream memory caps: a. Non-streaming bodies: replaced bare `await res.json()` in parseJsonResponse and readErrorSuffix (openai + anthropic) with a bounded streaming reader (64 MiB success cap, 256 KiB error cap). b. Tool-call accumulator cardinality: added a 128-entry cap on the streaming accumulator maps in all three backends (openai, anthropic, bedrock). `index` is upstream-controlled and without a cardinality cap a hostile stream can allocate unbounded map entries. Tests: 219 passing (was 317/1-pre-existing-EMFILE in the wider suite, unchanged). All new tests cover the specific failure modes described. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…d-reader hardening, total tool-arg cap Gemini review findings adjudicated and applied: the analyticsTable schema test asserts on the exported attribute list instead of regexing the source; #isNativeOpenAI parses the URL hostname (port and spoofed-suffix safe); readBoundedJson wraps stream errors in the backend error class, cancels the body before throwing on the cap, and reuses a module-level TextDecoder; an 8 MiB total tool-argument cap bounds the accumulator maps across entries in all three streaming parsers. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

gemini-code-assist · 2026-06-10T21:13:08Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

claude · 2026-06-10T21:16:45Z

Reviewed; no blockers found.

- prettier reformatted three test files - rename unused `err` parameter to `_err` in agentLoop.test.js (oxlint no-unused-vars) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

heskew · 2026-06-12T23:50:06Z

Codex review leg completed (was quota-blocked at PR open). One finding, adjudicated real and addressed in the latest commit: the 64 MiB success-body cap could reject a maximal legal OpenAI embedding batch (2048 inputs × 3072 dims ≈ 125–190 MiB of JSON) — raised to 256 MiB, which still bounds a runaway upstream. No other blocking issues found.

🤖 Posted by Claude on Nathan's behalf

gemini-code-assist · 2026-06-12T23:50:09Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Codex review: a legal OpenAI embedding batch (2048 inputs x 3072 dims) serializes to ~125-190 MiB of JSON, over the 64 MiB cap — successful calls would have been rejected. 256 MiB clears the largest legal batch while still bounding a runaway upstream. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The constant was raised from 64 MiB to 256 MiB in this PR to handle large OpenAI embedding batch responses (125–190 MiB JSON). The unit test assertion was not updated alongside it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The concurrency: false option on the suite caused "Column family already dropped!" failures on Node.js v22, v26, and Windows (all passing on main without the option). The Bun failures in #1236 are pre-existing flakes: the terminology test passes on main including all Bun shards. Closing investigation — no change needed here; re-run when Bun flakes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

heskew and others added 2 commits June 10, 2026 13:43

heskew requested review from cb1kenobi and kriszyp June 10, 2026 21:13

fix(ci): format and lint fixes for models correctness batch

ed7feb2

- prettier reformatted three test files - rename unused `err` parameter to `_err` in agentLoop.test.js (oxlint no-unused-vars) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

heskew mentioned this pull request Jun 12, 2026

5.1 GA readiness: models/vector subsystem audit findings #1235

Open

11 tasks

heskew marked this pull request as ready for review June 12, 2026 23:50

claude Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread unitTests/resources/models/backendHelpers.test.js Outdated

kriszyp mentioned this pull request Jun 13, 2026

test: investigate Bun concurrency flake in terminology tests (closed — no fix needed) #1278

Closed

Merge branch 'main' into fix/models-5.1-correctness

a19a0cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(models): backend and analytics correctness batch for 5.1 GA#1236

fix(models): backend and analytics correctness batch for 5.1 GA#1236
heskew wants to merge 6 commits into
mainfrom
fix/models-5.1-correctness

heskew commented Jun 10, 2026

Uh oh!

gemini-code-assist Bot commented Jun 10, 2026

Uh oh!

claude Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

heskew commented Jun 12, 2026

Uh oh!

gemini-code-assist Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

heskew commented Jun 10, 2026

Summary

Where to put attention

Uh oh!

gemini-code-assist Bot commented Jun 10, 2026

Uh oh!

claude Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

heskew commented Jun 12, 2026

Uh oh!

gemini-code-assist Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented Jun 10, 2026 •

edited

Loading