fix(parsing): vLLM-parity type ladder for XML tool-call arg coercion#52
Open
hallerite wants to merge 7 commits into
Open
fix(parsing): vLLM-parity type ladder for XML tool-call arg coercion#52hallerite wants to merge 7 commits into
hallerite wants to merge 7 commits into
Conversation
Renderers' Qwen3.5/GLM/MiniMax/Laguna XML parsers flagged
``<parameter=x>True</parameter>`` as INVALID_JSON for boolean params
because ``json.loads("True")`` fails. Both SGLang's
``Qwen3CoderDetector`` and vLLM's ``Qwen3CoderToolParser`` accept it via
``param_value.lower() == "true"`` — so Pythonic literals the model
freely emits round-trip cleanly at inference but came back malformed
through renderers.
Replaces the string-or-``json.loads`` dispatch with the full
``_convert_param_value`` ladder shared by both reference parsers:
case-insensitive ``null``, ``int()`` / ``float()`` for numeric families
with int demotion, case-folded bool, ``json.loads`` → ``ast.literal_eval``
for objects/arrays/anyOf, ``ast.literal_eval`` catch-all. INVALID_JSON
remains as the verifier / RL-loss signal for values that had to
degenerate (e.g. ``yes`` for bool, ``abc`` for int).
One deliberate deviation from vLLM/SGLang: declared ``string`` types
preserve ``"null"`` verbatim instead of coercing to Python ``None``, so
the existing ``test_tool_arg_type_preservation`` string-verbatim
contract still holds.
Fixes #47.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror vLLM's Qwen3CoderToolParser end to end (chosen over qwen3_xml because qwen3_coder is a strict superset for offline value coercion; qwen3_xml's edge over qwen3_coder is its streaming state machine, which does not apply here yet — module comment notes the TODO to switch when streaming lands). Closes the three remaining structural deltas: - Adopt vLLM's three tolerant regex patterns (tool_call / function / parameter) so unclosed </parameter>, </function>, and </tool_call> tags are recovered instead of silently dropping their content. - Add the qwen3coder_tool_parser.py:269-271 back-off: when output has no <tool_call> markers, scan the whole completion for raw <function=...></function> blocks and recover the call. Content text is whatever remains after the function blocks are stripped. - Strip exactly one leading and one trailing newline from parameter values (qwen3coder_tool_parser.py:246-250); interior whitespace round-trips verbatim — previous .strip() was too aggressive. - Fix the no-schema branch of _coerce_arg_value to match vLLM (qwen3coder_tool_parser.py:128-137): return the raw text verbatim after the case-insensitive null short-circuit instead of attempting json.loads. Qwen3.6 needs no separate parser changes — Qwen36Renderer inherits parse_response from Qwen35Renderer and shares the same parse_qwen35 plumbing, so the vLLM-parity ladder + structural tolerance applies to both 3.5 (Python repr literals) and 3.6 (JSON literals) in one shot. Qwen35ToolParser in parsers.py (DefaultRenderer + tool_parser= 'qwen3.5' path) is updated to share the new _parse_xml_function_blocks helper so both code paths produce identical output. Tests: 7 new structural-parity tests pin the regex tolerance, no-tag back-off, and whitespace stripping against the vLLM reference shape. Existing test_no_schema_falls_back_to_json_loads is renamed and inverted to match the new (vLLM-faithful) behavior.
ApprovabilityVerdict: Needs human review Significant refactor of tool-call argument parsing with new type coercion logic, schema handling, and changed parse status determination. The runtime behavior changes and unresolved review comment about status precedence warrant human review. No code changes detected at You can customize Macroscope's approvability policy. Learn more. |
…xception path Close the four remaining behavioral deltas vs vLLM's Qwen3CoderToolParser non-streaming extract path: - Content text is now the raw slice (no .strip()). vLLM qwen3coder_tool_parser.py:316-319 explicitly leaves the commented-out .rstrip() in place; surrounding whitespace before/after <tool_call> round-trips verbatim. - Backoff content (no <tool_call> markers) is the text up to the first <function= position only, matching vLLM line 317-319. Removed _strip_function_blocks helper that concatenated leftovers between function blocks — different from vLLM. - Each successfully-parsed tool call is assigned an id in vLLM's default format: chatcmpl-tool-<16 hex chars> via uuid.uuid4().int & MASK_64_BITS, matching vLLM's make_tool_call_id / random_uuid pair. Malformed entries (no function name) intentionally stay id-less since vLLM would have dropped them entirely. - Tool-call extraction is wrapped in try/except: on any unhandled error, surface tool_calls=[] and content=full_decoded_text — the vLLM qwen3coder_tool_parser.py:327-331 catch-all behavior. Reasoning extraction stays outside the wrapper since vLLM doesn't do reasoning in this parser. Tests: 4 new cases pin content-no-strip, backoff content slice, chatcmpl-tool-<hex> id format, and the exception fallback.
…byte
vLLM's _convert_param_value (qwen3coder_tool_parser.py:125) runs the
case-insensitive "null" short-circuit BEFORE any type check —
including the string family. That means an XML wire value of
<parameter=x>null</parameter> against a {"type": "string"} schema
collapses to Python None, not the string "null".
Previously this repo carried a deliberate deviation to preserve the
string verbatim across the four XML renderers (Qwen3.5/3.6, GLM-5,
MiniMax-M2.5, Laguna-XS.2). The motivation was a tidier round-trip
contract: "string params always round-trip as strings." But the XML
wire format genuinely cannot distinguish the string "null" from a
JSON null, and matching vLLM is more important than matching our
sibling renderers — downstream consumers (verifier, RL-loss) align
with vLLM's interpretation, not ours.
Test updates:
- test_arg_coercion::test_null_is_preserved_verbatim_for_string_type
is renamed to test_null_short_circuit_applies_even_for_string_type
and inverted: a string-typed "null" now returns None.
- test_tool_arg_type_preservation::test_string_arg_preserves_type
branches on _JSON_FORMAT_MODELS (Qwen3 Hermes, Kimi K2 section JSON)
vs XML renderers for the string-null case only. JSON renderers
preserve "null" verbatim because the wire bytes quote strings; XML
renderers null-coerce. Other string cases (bool/int/array/object)
still round-trip verbatim everywhere.
Folds in #63's bare-string-for-union-schemas fix and refactors _coerce_arg_value to use vLLM's current shared helpers (extract_types_from_schema + coerce_to_schema_type from vllm/tool_parsers/utils.py, landed in vLLM #43025 on 2026-05-19), replacing this branch's earlier single-type ladder. Why: #52 was originally written against pre-refactor vLLM and would have regressed #63 (anyOf schemas routed to "object" → json.loads on bare strings flagged INVALID_JSON). vLLM's new shape recursively flattens anyOf/oneOf/allOf into a type set and walks a priority- ordered ladder (null > integer > number > boolean > object > array > string) where string is the always-succeeding terminal — which absorbs #63's case as a side effect. Behavior changes relative to the pre-merge branch: - Union schemas (anyOf/oneOf/allOf) recursively flatten — Union[str,X] now accepts bare strings without flagging (the #63 win). - No top-level "null" short-circuit. Null coercion only fires when "null" is in the schema's type set (Optional[X] → anyOf [X, null] or type: ["X", "null"]). String-typed "null" stays a string. - "yes" / non-boolean text for bool-typed param returns raw text + INVALID flag (was: False + INVALID). - Number branch demotes whole floats to int regardless of source shape (1e3 → 1, 1.0 → 1) — matches vLLM's val.is_integer() rule. - No ast.literal_eval anywhere — vLLM doesn't use it. Python-literal dicts ('k':1) no longer parse for object params. - "binary" dropped as a bool alias (not in vLLM's _TYPE_ALIASES). Renderers-specific deviation kept: used_fallback flag returned alongside the value, propagated to ToolCallParseStatus.INVALID_JSON for the verifier / RL-loss signal. vLLM has no such signal. Tests updated for new behavior; full suite green (1775+99=1874 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8bd06f6. Configure here.
Three bugbot findings from PR #52: - HIGH: ``int(val)`` in the number branch of ``_coerce_arg_value`` was outside the try block, so ``float("nan")`` (raises ValueError on ``int()``) and ``float("inf")`` (raises OverflowError) would abort the whole parse. Move ``int(val)`` back inside the try and widen the catch to include ``OverflowError`` — minor improvement over vLLM, which crashes on ``inf``. - LOW: ``_TOOL_CALL_BLOCK_RE`` was defined at module level but never used (carryover from the vLLM regex listing). Removed. - MEDIUM: document the ``UNCLOSED_BLOCK`` > ``INVALID_JSON`` status precedence in ``_parse_xml_function_blocks``. Structural failure dominates coercion failure when a block is both truncated and has uncoercible args; the verifier already discounts unclosed blocks, so combining the two would be noise. Intent now in a comment so future readers don't flip it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Closes #47. Replaces renderers' XML-style tool-call arg coercion with a byte-for-byte port of vLLM's shared
extract_types_from_schema+coerce_to_schema_typehelpers fromvllm/tool_parsers/utils.py. The previousjson.loads-centric coercion flagged<parameter=x>True</parameter>asINVALID_JSONfor boolean params becausejson.loads("True")fails — even though both SGLang'sQwen3CoderDetectorand vLLM'sQwen3CoderToolParseraccept Pythonic literals via case-folded comparison. Models freely emitTrue/Falseat inference because the reference parsers normalize them, but the same outputs came back malformed through renderers.The fix
Renderers now mirrors vLLM's current shape (refactored upstream in vLLM #43025, 2026-05-19):
_extract_types_from_schemarecursively flattens the JSON-Schema fragment to a set of types — handles top-leveltype(string or list), infers types fromenumvalues, and recurses throughanyOf/oneOf/allOf. Defaults to["string"]when no type info is available._coerce_arg_valuewalks a priority-ordered ladder, returning the first successful coercion:null"null"⇒Noneinteger(+ aliasesint,uint,long, …)int(text), else continuenumber(+ aliasesfloat,double, …)float(text), demote to int when whole, else continueboolean(+ aliasbool)text.lower().strip() in {"true","1"}⇒True;{"false","0"}⇒False; else continueobject(+ aliasesdict)json.loads(text), else continuearray(+ aliasesarr,list,sequence)json.loads(text), else continuestring(+ aliasesstr,text,enum, …)When the schema doesn't include
stringand no typed branch accepts the value, a terminaljson.loadsis tried before falling back to raw text +INVALID_JSON.Renderers-specific deviation
_coerce_arg_valuereturns(value, used_fallback). The flag isTrueiff every typed branch declined and the terminaljson.loadsalso failed — i.e. the value couldn't be expressed as anything the schema permits. The parser propagates that toToolCallParseStatus.INVALID_JSONfor the verifier / RL-loss signal. vLLM has no such signal (its callers don't need one).Other structural changes (kept from the original branch)
parse_qwen35refactored to use shared_parse_xml_function_blocks, mirroring vLLM's three regex patterns (closed + unclosed alternations,<parameter>lookahead recovery).<tool_call>markers but with<function=…>blocks are still surfaced (vLLMqwen3coder_tool_parser.py:269-271).:316-319).chatcmpl-tool-<16hex>format.:327-331).Behavior changes vs renderers' prior coercion
anyOf/oneOf/allOf) recursively flatten —Union[str, X]with bare-string wire values now stays OK, subsuming the fix from fix(parsing): accept bare-string args for union schemas with string #63 as a side effect."null"only coerces toNonewhen"null"is in the schema's type set (Optional[X]⇒anyOf [X, null], ortype: ["X", "null"]). Pure string-typed"null"stays the string"null". This drops the deliberate XML-ambiguity carve-out that earlier commits on this branch had carried.<parameter=x>yes</parameter>for abool-typed param now returns the raw text"yes"+INVALID_JSON, notFalse+INVALID_JSON. Matches current vLLM (the bool branch declines and the priority loop continues to the terminal).1e3⇒int 1000,1.0⇒int 1, regardless of source-string shape. Pure vLLM rule (val.is_integer()), not the prior"." in textheuristic.ast.literal_evalanywhere — vLLM doesn't use it. Python-literal dicts like{'k': 1}for an object param now flagINVALID_JSON."binary"dropped as abooleanalias (not in vLLM's_TYPE_ALIASES).Reference parser sources
extract_types_from_schema/coerce_to_schema_type:vllm/tool_parsers/utils.py:453,526Qwen3CoderToolParser._convert_param_value:vllm/tool_parsers/qwen3coder_tool_parser.py:123(routes through the shared helpers)Qwen3CoderDetector:python/sglang/srt/function_call/qwen3_coder_detector.py(functionally close; vLLM is the strict reference)Test plan
pytest tests/test_arg_coercion.py— 51 unit tests covering every branch of the new ladder, including theTrue/Falseregression through fullparse_qwen35round-trip and explicit cases for the behavior changes above.pytest tests/test_tool_arg_type_preservation.py— 48/48 cases pass across all five XML renderers + two JSON-format controls. String-typed"null"now round-trips as"null"(wasNoneon the prior branch); theOptional[str]shape was never exercised here.🤖 Generated with Claude Code
Note
Medium Risk
Parsing behavior changes across Qwen3.5 and shared
_coerce_arg_value(used by GLM/MiniMax/Laguna too): null coercion, union schemas, and untyped params now differ from the oldjson.loads-first logic—intentional but can shift verifier/RL signals for edge-case completions.Overview
Replaces XML tool-call argument coercion with vLLM’s schema-driven type ladder (
null→integer→number→boolean→object→array→string), including recursiveanyOf/oneOf/allOfflattening. That fixes cases like<parameter=…>True</parameter>on boolean fields (no longerINVALID_JSON) and keeps string-typed"null"as a string unlessnullis in the schema.Qwen3.5 parsing is refactored to shared
_parse_xml_function_blocksusing vLLM’s tolerant regexes (missing</parameter>, unclosed<function>, backoff when<tool_call>is absent). Content before tools is no longer stripped; calls getchatcmpl-tool-<16hex>ids; extraction errors return full text with no partial tool calls.Qwen35ToolParserdelegates to the same helper (schema-less path = verbatim strings).Adds
tests/test_arg_coercion.pyand updates type-preservation test docs for the new null/string rules.Reviewed by Cursor Bugbot for commit af8d6e1. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Fix XML tool-call argument coercion to match vLLM's type-ladder semantics
_coerce_arg_valuein parsing.py to mirror vLLM's priority-ordered type ladder:null > integer > number > boolean > object > array > string, with case-insensitive boolean parsing and null coercion only when declared in the schema._parse_xml_function_blocks, a tolerant parser that accepts missing</parameter>and unclosed</function>tags, strips a single leading/trailing newline from parameter bodies, and assigns vLLM-stylechatcmpl-tool-<16 hex>ids via the new_make_tool_call_idhelper.parse_qwen35to preserve content whitespace, recover tool calls from bare<function=...>blocks when<tool_call>wrappers are absent, and fall back to returning raw content on extraction errors._parse_xml_tool_callsandQwen35ToolParser.extractto delegate to_parse_xml_function_blocks, producing explicitUNCLOSED_BLOCKorMALFORMED_STRUCTUREdiagnostics instead of silently dropping malformed calls.integerare demoted toint.Macroscope summarized af8d6e1.