Skip to content

fix(models): backend and analytics correctness batch for 5.1 GA#1236

Open
heskew wants to merge 6 commits into
mainfrom
fix/models-5.1-correctness
Open

fix(models): backend and analytics correctness batch for 5.1 GA#1236
heskew wants to merge 6 commits into
mainfrom
fix/models-5.1-correctness

Conversation

@heskew

@heskew heskew commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Five verified correctness fixes to the models subsystem from the 5.1 GA readiness audit — relates to #1235 (PR 2 of 2, companion to #1234).

  1. hdb_model_calls phantom indexes — dropped indexed: true from all nine non-PK attributes. Writes go through primaryStore.put, which bypasses index maintenance, so the declared indexes were permanently empty and any attribute-filtered billing/usage query silently returned zero rows. Matches the hdb_raw_analytics pattern. The attribute list is now an exported const so the schema test asserts structurally.
  2. Bedrock inference-profile model IDsfamilyOf() walks the dot-segments until a known family is found, so us.anthropic.claude-… / eu.meta.… / global.… dispatch correctly (previously: family 'us' → unknown → throw, breaking effectively all current Claude-on-Bedrock). amazon.nova-* now throws a descriptive "not yet supported" error instead of being sent a malformed Titan-shaped body.
  3. Abort gate at tool dispatchrunSingleToolCall() checks the composed signal before invoking a handler, so side-effecting tool handlers never start after the caller has aborted (closes the serial-dispatch gap too).
  4. OpenAI max_completion_tokens — sent when baseUrl resolves to native api.openai.com (current reasoning models reject max_tokens); custom baseUrls keep max_tokens since OpenAI-compatible shims may not know the newer param. Detection parses the URL hostname (port-suffix and spoofed-host safe).
  5. Upstream memory capsreadBoundedJson() bounds non-streaming bodies (64 MiB success / 256 KiB error — previously res.json() buffered unbounded and the 500-char error cap only trimmed after parsing), wraps stream errors in the backend error class, and cancels the body on breach. The streaming tool-call accumulator maps are now capped at 128 entries and 8 MiB total argument chars across entries (per-entry 1 MiB cap alone allowed ~128 MiB) in all three streaming parsers.

Where to put attention

  • The native-OpenAI detection heuristic (fix 4): hostname equality with api.openai.com. Proxies/gateways fronting OpenAI (Cloudflare AI Gateway, Helicone) will get max_tokens and fail against reasoning models — by design for now (no reliable detection); flagged by the cross-model review as the residual case. A per-entry config override would be the escape hatch if users hit it.
  • The Nova fail-loud (fix 2) is a behavior change from "malformed request, upstream error" to "immediate descriptive error" — strictly better, but it is the one place this PR rejects something it previously attempted.
  • Cap constants (64 MiB / 256 KiB / 128 entries / 8 MiB) are judgment calls sized generously above real-world responses; sanity-check the numbers.

Tests: 75 passing across the targeted unit files (includes new cases for inference-profile dispatch, Nova rejection, pre-aborted dispatch, native-vs-shim token param, bounded-reader limits, and accumulator caps); broader models+components suites green except one pre-existing EMFILE failure confirmed on main.

Cross-model review: Gemini findings adjudicated and applied in the second commit (structural test, URL-parsed detection, bounded-reader hardening, total-arg cap). Codex leg unavailable (weekly session limit). Generated by an LLM (Claude, Fable 5).

🤖 Generated with Claude Code

heskew and others added 2 commits June 10, 2026 13:43
Five verified bugs found in the models backend:

1. hdb_model_calls phantom indexes: dropped `indexed: true` from all
   non-PK attributes in analyticsTable.ts. flush() and cleanup() write
   via primaryStore.put/remove which bypass updateIndices, so secondary
   indexes would stay permanently empty. Matches hdb_raw_analytics pattern.

2. Bedrock inference-profile model IDs: familyOf() now walks all dot-
   separated segments to find the first known family, resolving cross-
   region IDs like `us.anthropic.claude-3-5-sonnet-…` that previously
   resolved to family 'unknown'. Also added a descriptive error for
   amazon.nova-* models before they hit the wrong (Titan) body shape.

3. Abort gate at tool dispatch: added ctx.signal?.throwIfAborted() at the
   top of runSingleToolCall so pre-aborted signals are caught before any
   side-effecting handler starts. The serial path between multiple handlers
   also benefits since each handler entry now checks the signal.

4. OpenAI max_tokens vs max_completion_tokens: OpenAI's reasoning/gpt-5
   models reject `max_tokens` with a 400. Send `max_completion_tokens`
   when talking to api.openai.com; keep `max_tokens` for any custom
   baseUrl to preserve compatibility with vLLM, Ollama-compat, and other
   OpenAI-compatible shims.

5. Upstream memory caps:
   a. Non-streaming bodies: replaced bare `await res.json()` in
      parseJsonResponse and readErrorSuffix (openai + anthropic) with a
      bounded streaming reader (64 MiB success cap, 256 KiB error cap).
   b. Tool-call accumulator cardinality: added a 128-entry cap on the
      streaming accumulator maps in all three backends (openai, anthropic,
      bedrock). `index` is upstream-controlled and without a cardinality
      cap a hostile stream can allocate unbounded map entries.

Tests: 219 passing (was 317/1-pre-existing-EMFILE in the wider suite,
unchanged). All new tests cover the specific failure modes described.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d-reader hardening, total tool-arg cap

Gemini review findings adjudicated and applied: the analyticsTable
schema test asserts on the exported attribute list instead of
regexing the source; #isNativeOpenAI parses the URL hostname (port
and spoofed-suffix safe); readBoundedJson wraps stream errors in the
backend error class, cancels the body before throwing on the cap,
and reuses a module-level TextDecoder; an 8 MiB total tool-argument
cap bounds the accumulator maps across entries in all three
streaming parsers.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@heskew heskew requested review from cb1kenobi and kriszyp June 10, 2026 21:13
@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Reviewed; no blockers found.

- prettier reformatted three test files
- rename unused `err` parameter to `_err` in agentLoop.test.js (oxlint no-unused-vars)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@heskew

heskew commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

Codex review leg completed (was quota-blocked at PR open). One finding, adjudicated real and addressed in the latest commit: the 64 MiB success-body cap could reject a maximal legal OpenAI embedding batch (2048 inputs × 3072 dims ≈ 125–190 MiB of JSON) — raised to 256 MiB, which still bounds a runaway upstream. No other blocking issues found.


🤖 Posted by Claude on Nathan's behalf

@heskew heskew marked this pull request as ready for review June 12, 2026 23:50
@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Codex review: a legal OpenAI embedding batch (2048 inputs x 3072 dims)
serializes to ~125-190 MiB of JSON, over the 64 MiB cap — successful
calls would have been rejected. 256 MiB clears the largest legal batch
while still bounding a runaway upstream.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread unitTests/resources/models/backendHelpers.test.js Outdated
The constant was raised from 64 MiB to 256 MiB in this PR to handle
large OpenAI embedding batch responses (125–190 MiB JSON). The unit
test assertion was not updated alongside it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
kriszyp added a commit that referenced this pull request Jun 13, 2026
The concurrency: false option on the suite caused "Column family already
dropped!" failures on Node.js v22, v26, and Windows (all passing on main
without the option). The Bun failures in #1236 are pre-existing flakes:
the terminology test passes on main including all Bun shards.

Closing investigation — no change needed here; re-run when Bun flakes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants