fix(models): backend and analytics correctness batch for 5.1 GA#1236
Open
heskew wants to merge 6 commits into
Open
fix(models): backend and analytics correctness batch for 5.1 GA#1236heskew wants to merge 6 commits into
heskew wants to merge 6 commits into
Conversation
Five verified bugs found in the models backend:
1. hdb_model_calls phantom indexes: dropped `indexed: true` from all
non-PK attributes in analyticsTable.ts. flush() and cleanup() write
via primaryStore.put/remove which bypass updateIndices, so secondary
indexes would stay permanently empty. Matches hdb_raw_analytics pattern.
2. Bedrock inference-profile model IDs: familyOf() now walks all dot-
separated segments to find the first known family, resolving cross-
region IDs like `us.anthropic.claude-3-5-sonnet-…` that previously
resolved to family 'unknown'. Also added a descriptive error for
amazon.nova-* models before they hit the wrong (Titan) body shape.
3. Abort gate at tool dispatch: added ctx.signal?.throwIfAborted() at the
top of runSingleToolCall so pre-aborted signals are caught before any
side-effecting handler starts. The serial path between multiple handlers
also benefits since each handler entry now checks the signal.
4. OpenAI max_tokens vs max_completion_tokens: OpenAI's reasoning/gpt-5
models reject `max_tokens` with a 400. Send `max_completion_tokens`
when talking to api.openai.com; keep `max_tokens` for any custom
baseUrl to preserve compatibility with vLLM, Ollama-compat, and other
OpenAI-compatible shims.
5. Upstream memory caps:
a. Non-streaming bodies: replaced bare `await res.json()` in
parseJsonResponse and readErrorSuffix (openai + anthropic) with a
bounded streaming reader (64 MiB success cap, 256 KiB error cap).
b. Tool-call accumulator cardinality: added a 128-entry cap on the
streaming accumulator maps in all three backends (openai, anthropic,
bedrock). `index` is upstream-controlled and without a cardinality
cap a hostile stream can allocate unbounded map entries.
Tests: 219 passing (was 317/1-pre-existing-EMFILE in the wider suite,
unchanged). All new tests cover the specific failure modes described.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d-reader hardening, total tool-arg cap Gemini review findings adjudicated and applied: the analyticsTable schema test asserts on the exported attribute list instead of regexing the source; #isNativeOpenAI parses the URL hostname (port and spoofed-suffix safe); readBoundedJson wraps stream errors in the backend error class, cancels the body before throwing on the cap, and reuses a module-level TextDecoder; an 8 MiB total tool-argument cap bounds the accumulator maps across entries in all three streaming parsers. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Contributor
|
Reviewed; no blockers found. |
- prettier reformatted three test files - rename unused `err` parameter to `_err` in agentLoop.test.js (oxlint no-unused-vars) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
11 tasks
Member
Author
|
Codex review leg completed (was quota-blocked at PR open). One finding, adjudicated real and addressed in the latest commit: the 64 MiB success-body cap could reject a maximal legal OpenAI embedding batch (2048 inputs × 3072 dims ≈ 125–190 MiB of JSON) — raised to 256 MiB, which still bounds a runaway upstream. No other blocking issues found. 🤖 Posted by Claude on Nathan's behalf |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Codex review: a legal OpenAI embedding batch (2048 inputs x 3072 dims) serializes to ~125-190 MiB of JSON, over the 64 MiB cap — successful calls would have been rejected. 256 MiB clears the largest legal batch while still bounding a runaway upstream. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The constant was raised from 64 MiB to 256 MiB in this PR to handle large OpenAI embedding batch responses (125–190 MiB JSON). The unit test assertion was not updated alongside it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
kriszyp
added a commit
that referenced
this pull request
Jun 13, 2026
The concurrency: false option on the suite caused "Column family already dropped!" failures on Node.js v22, v26, and Windows (all passing on main without the option). The Bun failures in #1236 are pre-existing flakes: the terminology test passes on main including all Bun shards. Closing investigation — no change needed here; re-run when Bun flakes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Five verified correctness fixes to the models subsystem from the 5.1 GA readiness audit — relates to #1235 (PR 2 of 2, companion to #1234).
hdb_model_callsphantom indexes — droppedindexed: truefrom all nine non-PK attributes. Writes go throughprimaryStore.put, which bypasses index maintenance, so the declared indexes were permanently empty and any attribute-filtered billing/usage query silently returned zero rows. Matches thehdb_raw_analyticspattern. The attribute list is now an exported const so the schema test asserts structurally.familyOf()walks the dot-segments until a known family is found, sous.anthropic.claude-…/eu.meta.…/global.…dispatch correctly (previously: family'us'→ unknown → throw, breaking effectively all current Claude-on-Bedrock).amazon.nova-*now throws a descriptive "not yet supported" error instead of being sent a malformed Titan-shaped body.runSingleToolCall()checks the composed signal before invoking a handler, so side-effecting tool handlers never start after the caller has aborted (closes the serial-dispatch gap too).max_completion_tokens— sent whenbaseUrlresolves to nativeapi.openai.com(current reasoning models rejectmax_tokens); custom baseUrls keepmax_tokenssince OpenAI-compatible shims may not know the newer param. Detection parses the URL hostname (port-suffix and spoofed-host safe).readBoundedJson()bounds non-streaming bodies (64 MiB success / 256 KiB error — previouslyres.json()buffered unbounded and the 500-char error cap only trimmed after parsing), wraps stream errors in the backend error class, and cancels the body on breach. The streaming tool-call accumulator maps are now capped at 128 entries and 8 MiB total argument chars across entries (per-entry 1 MiB cap alone allowed ~128 MiB) in all three streaming parsers.Where to put attention
api.openai.com. Proxies/gateways fronting OpenAI (Cloudflare AI Gateway, Helicone) will getmax_tokensand fail against reasoning models — by design for now (no reliable detection); flagged by the cross-model review as the residual case. A per-entry config override would be the escape hatch if users hit it.Tests: 75 passing across the targeted unit files (includes new cases for inference-profile dispatch, Nova rejection, pre-aborted dispatch, native-vs-shim token param, bounded-reader limits, and accumulator caps); broader models+components suites green except one pre-existing EMFILE failure confirmed on main.
Cross-model review: Gemini findings adjudicated and applied in the second commit (structural test, URL-parsed detection, bounded-reader hardening, total-arg cap). Codex leg unavailable (weekly session limit). Generated by an LLM (Claude, Fable 5).
🤖 Generated with Claude Code