fix(parsing): vLLM-parity type ladder for XML tool-call arg coercion by hallerite · Pull Request #52 · PrimeIntellect-ai/renderers

hallerite · 2026-05-18T16:47:38Z

Summary

Closes #47. Replaces renderers' XML-style tool-call arg coercion with a byte-for-byte port of vLLM's shared extract_types_from_schema + coerce_to_schema_type helpers from vllm/tool_parsers/utils.py. The previous json.loads-centric coercion flagged <parameter=x>True</parameter> as INVALID_JSON for boolean params because json.loads("True") fails — even though both SGLang's Qwen3CoderDetector and vLLM's Qwen3CoderToolParser accept Pythonic literals via case-folded comparison. Models freely emit True / False at inference because the reference parsers normalize them, but the same outputs came back malformed through renderers.

The fix

Renderers now mirrors vLLM's current shape (refactored upstream in vLLM #43025, 2026-05-19):

_extract_types_from_schema recursively flattens the JSON-Schema fragment to a set of types — handles top-level type (string or list), infers types from enum values, and recurses through anyOf / oneOf / allOf. Defaults to ["string"] when no type info is available.
_coerce_arg_value walks a priority-ordered ladder, returning the first successful coercion:

Priority	Type	Coercion
1	`null`	case-insensitive `"null"` ⇒ `None`
2	`integer` (+ aliases `int`, `uint`, `long`, …)	`int(text)`, else continue
3	`number` (+ aliases `float`, `double`, …)	`float(text)`, demote to int when whole, else continue
4	`boolean` (+ alias `bool`)	`text.lower().strip() in {"true","1"}` ⇒ `True`; `{"false","0"}` ⇒ `False`; else continue
5	`object` (+ aliases `dict`)	`json.loads(text)`, else continue
6	`array` (+ aliases `arr`, `list`, `sequence`)	`json.loads(text)`, else continue
7	`string` (+ aliases `str`, `text`, `enum`, …)	return verbatim — always-succeeding terminal

When the schema doesn't include string and no typed branch accepts the value, a terminal json.loads is tried before falling back to raw text + INVALID_JSON.

Renderers-specific deviation

_coerce_arg_value returns (value, used_fallback). The flag is True iff every typed branch declined and the terminal json.loads also failed — i.e. the value couldn't be expressed as anything the schema permits. The parser propagates that to ToolCallParseStatus.INVALID_JSON for the verifier / RL-loss signal. vLLM has no such signal (its callers don't need one).

Other structural changes (kept from the original branch)

parse_qwen35 refactored to use shared _parse_xml_function_blocks, mirroring vLLM's three regex patterns (closed + unclosed alternations, <parameter> lookahead recovery).
Back-off path: completions without <tool_call> markers but with <function=…> blocks are still surfaced (vLLM qwen3coder_tool_parser.py:269-271).
Content text no longer stripped (vLLM :316-319).
Tool-call ids in vLLM's chatcmpl-tool-<16hex> format.
Exception catch-all: any unhandled extraction error surfaces full text as content with no tool calls (vLLM :327-331).

Behavior changes vs renderers' prior coercion

Union schemas (anyOf / oneOf / allOf) recursively flatten — Union[str, X] with bare-string wire values now stays OK, subsuming the fix from fix(parsing): accept bare-string args for union schemas with string #63 as a side effect.
No top-level null short-circuit. "null" only coerces to None when "null" is in the schema's type set (Optional[X] ⇒ anyOf [X, null], or type: ["X", "null"]). Pure string-typed "null" stays the string "null". This drops the deliberate XML-ambiguity carve-out that earlier commits on this branch had carried.
Bool degeneracy: <parameter=x>yes</parameter> for a bool-typed param now returns the raw text "yes" + INVALID_JSON, not False + INVALID_JSON. Matches current vLLM (the bool branch declines and the priority loop continues to the terminal).
Number demotion: 1e3 ⇒ int 1000, 1.0 ⇒ int 1, regardless of source-string shape. Pure vLLM rule (val.is_integer()), not the prior "." in text heuristic.
No ast.literal_eval anywhere — vLLM doesn't use it. Python-literal dicts like {'k': 1} for an object param now flag INVALID_JSON.
"binary" dropped as a boolean alias (not in vLLM's _TYPE_ALIASES).

Reference parser sources

vLLM extract_types_from_schema / coerce_to_schema_type: vllm/tool_parsers/utils.py:453,526
vLLM Qwen3CoderToolParser._convert_param_value: vllm/tool_parsers/qwen3coder_tool_parser.py:123 (routes through the shared helpers)
SGLang Qwen3CoderDetector: python/sglang/srt/function_call/qwen3_coder_detector.py (functionally close; vLLM is the strict reference)

Test plan

pytest tests/test_arg_coercion.py — 51 unit tests covering every branch of the new ladder, including the True / False regression through full parse_qwen35 round-trip and explicit cases for the behavior changes above.
pytest tests/test_tool_arg_type_preservation.py — 48/48 cases pass across all five XML renderers + two JSON-format controls. String-typed "null" now round-trips as "null" (was None on the prior branch); the Optional[str] shape was never exercised here.
Full suite: 1874 passed, 53 skipped, 1 xfailed.

🤖 Generated with Claude Code

Note

Medium Risk
Parsing behavior changes across Qwen3.5 and shared _coerce_arg_value (used by GLM/MiniMax/Laguna too): null coercion, union schemas, and untyped params now differ from the old json.loads-first logic—intentional but can shift verifier/RL signals for edge-case completions.

Overview
Replaces XML tool-call argument coercion with vLLM’s schema-driven type ladder (null → integer → number → boolean → object → array → string), including recursive anyOf/oneOf/allOf flattening. That fixes cases like <parameter=…>True</parameter> on boolean fields (no longer INVALID_JSON) and keeps string-typed "null" as a string unless null is in the schema.

Qwen3.5 parsing is refactored to shared _parse_xml_function_blocks using vLLM’s tolerant regexes (missing </parameter>, unclosed <function>, backoff when <tool_call> is absent). Content before tools is no longer stripped; calls get chatcmpl-tool-<16hex> ids; extraction errors return full text with no partial tool calls. Qwen35ToolParser delegates to the same helper (schema-less path = verbatim strings).

Adds tests/test_arg_coercion.py and updates type-preservation test docs for the new null/string rules.

^{Reviewed by Cursor Bugbot for commit af8d6e1. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Fix XML tool-call argument coercion to match vLLM's type-ladder semantics

Rewrites _coerce_arg_value in parsing.py to mirror vLLM's priority-ordered type ladder: null > integer > number > boolean > object > array > string, with case-insensitive boolean parsing and null coercion only when declared in the schema.
Adds _parse_xml_function_blocks, a tolerant parser that accepts missing </parameter> and unclosed </function> tags, strips a single leading/trailing newline from parameter bodies, and assigns vLLM-style chatcmpl-tool-<16 hex> ids via the new _make_tool_call_id helper.
Updates parse_qwen35 to preserve content whitespace, recover tool calls from bare <function=...> blocks when <tool_call> wrappers are absent, and fall back to returning raw content on extraction errors.
Refactors _parse_xml_tool_calls and Qwen35ToolParser.extract to delegate to _parse_xml_function_blocks, producing explicit UNCLOSED_BLOCK or MALFORMED_STRUCTURE diagnostics instead of silently dropping malformed calls.
Behavioral Change: untyped parameters are now returned as raw strings rather than being JSON-coerced; whole floats declared as integer are demoted to int.

^{Macroscope summarized af8d6e1.}

Renderers' Qwen3.5/GLM/MiniMax/Laguna XML parsers flagged ``<parameter=x>True</parameter>`` as INVALID_JSON for boolean params because ``json.loads("True")`` fails. Both SGLang's ``Qwen3CoderDetector`` and vLLM's ``Qwen3CoderToolParser`` accept it via ``param_value.lower() == "true"`` — so Pythonic literals the model freely emits round-trip cleanly at inference but came back malformed through renderers. Replaces the string-or-``json.loads`` dispatch with the full ``_convert_param_value`` ladder shared by both reference parsers: case-insensitive ``null``, ``int()`` / ``float()`` for numeric families with int demotion, case-folded bool, ``json.loads`` → ``ast.literal_eval`` for objects/arrays/anyOf, ``ast.literal_eval`` catch-all. INVALID_JSON remains as the verifier / RL-loss signal for values that had to degenerate (e.g. ``yes`` for bool, ``abc`` for int). One deliberate deviation from vLLM/SGLang: declared ``string`` types preserve ``"null"`` verbatim instead of coercing to Python ``None``, so the existing ``test_tool_arg_type_preservation`` string-verbatim contract still holds. Fixes #47. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirror vLLM's Qwen3CoderToolParser end to end (chosen over qwen3_xml because qwen3_coder is a strict superset for offline value coercion; qwen3_xml's edge over qwen3_coder is its streaming state machine, which does not apply here yet — module comment notes the TODO to switch when streaming lands). Closes the three remaining structural deltas: - Adopt vLLM's three tolerant regex patterns (tool_call / function / parameter) so unclosed </parameter>, </function>, and </tool_call> tags are recovered instead of silently dropping their content. - Add the qwen3coder_tool_parser.py:269-271 back-off: when output has no <tool_call> markers, scan the whole completion for raw <function=...></function> blocks and recover the call. Content text is whatever remains after the function blocks are stripped. - Strip exactly one leading and one trailing newline from parameter values (qwen3coder_tool_parser.py:246-250); interior whitespace round-trips verbatim — previous .strip() was too aggressive. - Fix the no-schema branch of _coerce_arg_value to match vLLM (qwen3coder_tool_parser.py:128-137): return the raw text verbatim after the case-insensitive null short-circuit instead of attempting json.loads. Qwen3.6 needs no separate parser changes — Qwen36Renderer inherits parse_response from Qwen35Renderer and shares the same parse_qwen35 plumbing, so the vLLM-parity ladder + structural tolerance applies to both 3.5 (Python repr literals) and 3.6 (JSON literals) in one shot. Qwen35ToolParser in parsers.py (DefaultRenderer + tool_parser= 'qwen3.5' path) is updated to share the new _parse_xml_function_blocks helper so both code paths produce identical output. Tests: 7 new structural-parity tests pin the regex tolerance, no-tag back-off, and whitespace stripping against the vLLM reference shape. Existing test_no_schema_falls_back_to_json_loads is renamed and inverted to match the new (vLLM-faithful) behavior.

macroscopeapp · 2026-05-20T12:55:35Z

Approvability

Verdict: Needs human review

Significant refactor of tool-call argument parsing with new type coercion logic, schema handling, and changed parse status determination. The runtime behavior changes and unresolved review comment about status precedence warrant human review.

No code changes detected at af8d6e1. Prior analysis still applies.

^{You can customize Macroscope's approvability policy. Learn more.}

…xception path Close the four remaining behavioral deltas vs vLLM's Qwen3CoderToolParser non-streaming extract path: - Content text is now the raw slice (no .strip()). vLLM qwen3coder_tool_parser.py:316-319 explicitly leaves the commented-out .rstrip() in place; surrounding whitespace before/after <tool_call> round-trips verbatim. - Backoff content (no <tool_call> markers) is the text up to the first <function= position only, matching vLLM line 317-319. Removed _strip_function_blocks helper that concatenated leftovers between function blocks — different from vLLM. - Each successfully-parsed tool call is assigned an id in vLLM's default format: chatcmpl-tool-<16 hex chars> via uuid.uuid4().int & MASK_64_BITS, matching vLLM's make_tool_call_id / random_uuid pair. Malformed entries (no function name) intentionally stay id-less since vLLM would have dropped them entirely. - Tool-call extraction is wrapped in try/except: on any unhandled error, surface tool_calls=[] and content=full_decoded_text — the vLLM qwen3coder_tool_parser.py:327-331 catch-all behavior. Reasoning extraction stays outside the wrapper since vLLM doesn't do reasoning in this parser. Tests: 4 new cases pin content-no-strip, backoff content slice, chatcmpl-tool-<hex> id format, and the exception fallback.

…byte vLLM's _convert_param_value (qwen3coder_tool_parser.py:125) runs the case-insensitive "null" short-circuit BEFORE any type check — including the string family. That means an XML wire value of <parameter=x>null</parameter> against a {"type": "string"} schema collapses to Python None, not the string "null". Previously this repo carried a deliberate deviation to preserve the string verbatim across the four XML renderers (Qwen3.5/3.6, GLM-5, MiniMax-M2.5, Laguna-XS.2). The motivation was a tidier round-trip contract: "string params always round-trip as strings." But the XML wire format genuinely cannot distinguish the string "null" from a JSON null, and matching vLLM is more important than matching our sibling renderers — downstream consumers (verifier, RL-loss) align with vLLM's interpretation, not ours. Test updates: - test_arg_coercion::test_null_is_preserved_verbatim_for_string_type is renamed to test_null_short_circuit_applies_even_for_string_type and inverted: a string-typed "null" now returns None. - test_tool_arg_type_preservation::test_string_arg_preserves_type branches on _JSON_FORMAT_MODELS (Qwen3 Hermes, Kimi K2 section JSON) vs XML renderers for the string-null case only. JSON renderers preserve "null" verbatim because the wire bytes quote strings; XML renderers null-coerce. Other string cases (bool/int/array/object) still round-trip verbatim everywhere.

Folds in #63's bare-string-for-union-schemas fix and refactors _coerce_arg_value to use vLLM's current shared helpers (extract_types_from_schema + coerce_to_schema_type from vllm/tool_parsers/utils.py, landed in vLLM #43025 on 2026-05-19), replacing this branch's earlier single-type ladder. Why: #52 was originally written against pre-refactor vLLM and would have regressed #63 (anyOf schemas routed to "object" → json.loads on bare strings flagged INVALID_JSON). vLLM's new shape recursively flattens anyOf/oneOf/allOf into a type set and walks a priority- ordered ladder (null > integer > number > boolean > object > array > string) where string is the always-succeeding terminal — which absorbs #63's case as a side effect. Behavior changes relative to the pre-merge branch: - Union schemas (anyOf/oneOf/allOf) recursively flatten — Union[str,X] now accepts bare strings without flagging (the #63 win). - No top-level "null" short-circuit. Null coercion only fires when "null" is in the schema's type set (Optional[X] → anyOf [X, null] or type: ["X", "null"]). String-typed "null" stays a string. - "yes" / non-boolean text for bool-typed param returns raw text + INVALID flag (was: False + INVALID). - Number branch demotes whole floats to int regardless of source shape (1e3 → 1, 1.0 → 1) — matches vLLM's val.is_integer() rule. - No ast.literal_eval anywhere — vLLM doesn't use it. Python-literal dicts ('k':1) no longer parse for object params. - "binary" dropped as a bool alias (not in vLLM's _TYPE_ALIASES). Renderers-specific deviation kept: used_fallback flag returned alongside the value, propagated to ToolCallParseStatus.INVALID_JSON for the verifier / RL-loss signal. vLLM has no such signal. Tests updated for new behavior; full suite green (1775+99=1874 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 8bd06f6. Configure here.}

Three bugbot findings from PR #52: - HIGH: ``int(val)`` in the number branch of ``_coerce_arg_value`` was outside the try block, so ``float("nan")`` (raises ValueError on ``int()``) and ``float("inf")`` (raises OverflowError) would abort the whole parse. Move ``int(val)`` back inside the try and widen the catch to include ``OverflowError`` — minor improvement over vLLM, which crashes on ``inf``. - LOW: ``_TOOL_CALL_BLOCK_RE`` was defined at module level but never used (carryover from the vLLM regex listing). Removed. - MEDIUM: document the ``UNCLOSED_BLOCK`` > ``INVALID_JSON`` status precedence in ``_parse_xml_function_blocks``. Structural failure dominates coercion failure when a block is both truncated and has uncoercible args; the verifier already discounts unclosed blocks, so combining the two would be noise. Intent now in a comment so future readers don't flip it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…on-vllm-parity

hallerite mentioned this pull request May 20, 2026

Qwen3.5 tool-call parser is stricter than SGLang qwen3_coder on boolean args #47

Open

cursor Bot reviewed May 20, 2026

View reviewed changes

Comment thread renderers/parsing.py Outdated

cursor Bot reviewed May 20, 2026

View reviewed changes

Comment thread renderers/parsing.py Outdated

hallerite and others added 2 commits May 20, 2026 13:43

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread renderers/parsing.py Outdated

Comment thread renderers/parsing.py

hallerite mentioned this pull request May 26, 2026

fix(nemotron3): forward tools to parse_qwen35 for schema-aware coercion #65

Merged

2 tasks

Merge remote-tracking branch 'origin/main' into fix/qwen35-arg-coerci…

af8d6e1

…on-vllm-parity

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(parsing): vLLM-parity type ladder for XML tool-call arg coercion#52

fix(parsing): vLLM-parity type ladder for XML tool-call arg coercion#52
hallerite wants to merge 7 commits into
mainfrom
fix/qwen35-arg-coercion-vllm-parity

hallerite commented May 18, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

macroscopeapp Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hallerite commented May 18, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The fix

Renderers-specific deviation

Other structural changes (kept from the original branch)

Behavior changes vs renderers' prior coercion

Reference parser sources

Test plan

Fix XML tool-call argument coercion to match vLLM's type-ladder semantics

Uh oh!

Uh oh!

macroscopeapp Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hallerite commented May 18, 2026 •

edited by cursor Bot

Loading

macroscopeapp Bot commented May 20, 2026 •

edited

Loading