Skip to content

fix(parsing): vLLM-parity type ladder for XML tool-call arg coercion#52

Open
hallerite wants to merge 7 commits into
mainfrom
fix/qwen35-arg-coercion-vllm-parity
Open

fix(parsing): vLLM-parity type ladder for XML tool-call arg coercion#52
hallerite wants to merge 7 commits into
mainfrom
fix/qwen35-arg-coercion-vllm-parity

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented May 18, 2026

Summary

Closes #47. Replaces renderers' XML-style tool-call arg coercion with a byte-for-byte port of vLLM's shared extract_types_from_schema + coerce_to_schema_type helpers from vllm/tool_parsers/utils.py. The previous json.loads-centric coercion flagged <parameter=x>True</parameter> as INVALID_JSON for boolean params because json.loads("True") fails — even though both SGLang's Qwen3CoderDetector and vLLM's Qwen3CoderToolParser accept Pythonic literals via case-folded comparison. Models freely emit True / False at inference because the reference parsers normalize them, but the same outputs came back malformed through renderers.

The fix

Renderers now mirrors vLLM's current shape (refactored upstream in vLLM #43025, 2026-05-19):

  1. _extract_types_from_schema recursively flattens the JSON-Schema fragment to a set of types — handles top-level type (string or list), infers types from enum values, and recurses through anyOf / oneOf / allOf. Defaults to ["string"] when no type info is available.
  2. _coerce_arg_value walks a priority-ordered ladder, returning the first successful coercion:
Priority Type Coercion
1 null case-insensitive "null"None
2 integer (+ aliases int, uint, long, …) int(text), else continue
3 number (+ aliases float, double, …) float(text), demote to int when whole, else continue
4 boolean (+ alias bool) text.lower().strip() in {"true","1"}True; {"false","0"}False; else continue
5 object (+ aliases dict) json.loads(text), else continue
6 array (+ aliases arr, list, sequence) json.loads(text), else continue
7 string (+ aliases str, text, enum, …) return verbatim — always-succeeding terminal

When the schema doesn't include string and no typed branch accepts the value, a terminal json.loads is tried before falling back to raw text + INVALID_JSON.

Renderers-specific deviation

_coerce_arg_value returns (value, used_fallback). The flag is True iff every typed branch declined and the terminal json.loads also failed — i.e. the value couldn't be expressed as anything the schema permits. The parser propagates that to ToolCallParseStatus.INVALID_JSON for the verifier / RL-loss signal. vLLM has no such signal (its callers don't need one).

Other structural changes (kept from the original branch)

  • parse_qwen35 refactored to use shared _parse_xml_function_blocks, mirroring vLLM's three regex patterns (closed + unclosed alternations, <parameter> lookahead recovery).
  • Back-off path: completions without <tool_call> markers but with <function=…> blocks are still surfaced (vLLM qwen3coder_tool_parser.py:269-271).
  • Content text no longer stripped (vLLM :316-319).
  • Tool-call ids in vLLM's chatcmpl-tool-<16hex> format.
  • Exception catch-all: any unhandled extraction error surfaces full text as content with no tool calls (vLLM :327-331).

Behavior changes vs renderers' prior coercion

  • Union schemas (anyOf / oneOf / allOf) recursively flattenUnion[str, X] with bare-string wire values now stays OK, subsuming the fix from fix(parsing): accept bare-string args for union schemas with string #63 as a side effect.
  • No top-level null short-circuit. "null" only coerces to None when "null" is in the schema's type set (Optional[X]anyOf [X, null], or type: ["X", "null"]). Pure string-typed "null" stays the string "null". This drops the deliberate XML-ambiguity carve-out that earlier commits on this branch had carried.
  • Bool degeneracy: <parameter=x>yes</parameter> for a bool-typed param now returns the raw text "yes" + INVALID_JSON, not False + INVALID_JSON. Matches current vLLM (the bool branch declines and the priority loop continues to the terminal).
  • Number demotion: 1e3int 1000, 1.0int 1, regardless of source-string shape. Pure vLLM rule (val.is_integer()), not the prior "." in text heuristic.
  • No ast.literal_eval anywhere — vLLM doesn't use it. Python-literal dicts like {'k': 1} for an object param now flag INVALID_JSON.
  • "binary" dropped as a boolean alias (not in vLLM's _TYPE_ALIASES).

Reference parser sources

Test plan

  • pytest tests/test_arg_coercion.py — 51 unit tests covering every branch of the new ladder, including the True / False regression through full parse_qwen35 round-trip and explicit cases for the behavior changes above.
  • pytest tests/test_tool_arg_type_preservation.py — 48/48 cases pass across all five XML renderers + two JSON-format controls. String-typed "null" now round-trips as "null" (was None on the prior branch); the Optional[str] shape was never exercised here.
  • Full suite: 1874 passed, 53 skipped, 1 xfailed.

🤖 Generated with Claude Code


Note

Medium Risk
Parsing behavior changes across Qwen3.5 and shared _coerce_arg_value (used by GLM/MiniMax/Laguna too): null coercion, union schemas, and untyped params now differ from the old json.loads-first logic—intentional but can shift verifier/RL signals for edge-case completions.

Overview
Replaces XML tool-call argument coercion with vLLM’s schema-driven type ladder (nullintegernumberbooleanobjectarraystring), including recursive anyOf/oneOf/allOf flattening. That fixes cases like <parameter=…>True</parameter> on boolean fields (no longer INVALID_JSON) and keeps string-typed "null" as a string unless null is in the schema.

Qwen3.5 parsing is refactored to shared _parse_xml_function_blocks using vLLM’s tolerant regexes (missing </parameter>, unclosed <function>, backoff when <tool_call> is absent). Content before tools is no longer stripped; calls get chatcmpl-tool-<16hex> ids; extraction errors return full text with no partial tool calls. Qwen35ToolParser delegates to the same helper (schema-less path = verbatim strings).

Adds tests/test_arg_coercion.py and updates type-preservation test docs for the new null/string rules.

Reviewed by Cursor Bugbot for commit af8d6e1. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Fix XML tool-call argument coercion to match vLLM's type-ladder semantics

  • Rewrites _coerce_arg_value in parsing.py to mirror vLLM's priority-ordered type ladder: null > integer > number > boolean > object > array > string, with case-insensitive boolean parsing and null coercion only when declared in the schema.
  • Adds _parse_xml_function_blocks, a tolerant parser that accepts missing </parameter> and unclosed </function> tags, strips a single leading/trailing newline from parameter bodies, and assigns vLLM-style chatcmpl-tool-<16 hex> ids via the new _make_tool_call_id helper.
  • Updates parse_qwen35 to preserve content whitespace, recover tool calls from bare <function=...> blocks when <tool_call> wrappers are absent, and fall back to returning raw content on extraction errors.
  • Refactors _parse_xml_tool_calls and Qwen35ToolParser.extract to delegate to _parse_xml_function_blocks, producing explicit UNCLOSED_BLOCK or MALFORMED_STRUCTURE diagnostics instead of silently dropping malformed calls.
  • Behavioral Change: untyped parameters are now returned as raw strings rather than being JSON-coerced; whole floats declared as integer are demoted to int.

Macroscope summarized af8d6e1.

Renderers' Qwen3.5/GLM/MiniMax/Laguna XML parsers flagged
``<parameter=x>True</parameter>`` as INVALID_JSON for boolean params
because ``json.loads("True")`` fails. Both SGLang's
``Qwen3CoderDetector`` and vLLM's ``Qwen3CoderToolParser`` accept it via
``param_value.lower() == "true"`` — so Pythonic literals the model
freely emits round-trip cleanly at inference but came back malformed
through renderers.

Replaces the string-or-``json.loads`` dispatch with the full
``_convert_param_value`` ladder shared by both reference parsers:
case-insensitive ``null``, ``int()`` / ``float()`` for numeric families
with int demotion, case-folded bool, ``json.loads`` → ``ast.literal_eval``
for objects/arrays/anyOf, ``ast.literal_eval`` catch-all. INVALID_JSON
remains as the verifier / RL-loss signal for values that had to
degenerate (e.g. ``yes`` for bool, ``abc`` for int).

One deliberate deviation from vLLM/SGLang: declared ``string`` types
preserve ``"null"`` verbatim instead of coercing to Python ``None``, so
the existing ``test_tool_arg_type_preservation`` string-verbatim
contract still holds.

Fixes #47.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror vLLM's Qwen3CoderToolParser end to end (chosen over qwen3_xml
because qwen3_coder is a strict superset for offline value coercion;
qwen3_xml's edge over qwen3_coder is its streaming state machine,
which does not apply here yet — module comment notes the TODO to
switch when streaming lands). Closes the three remaining structural
deltas:

- Adopt vLLM's three tolerant regex patterns (tool_call / function /
  parameter) so unclosed </parameter>, </function>, and </tool_call>
  tags are recovered instead of silently dropping their content.
- Add the qwen3coder_tool_parser.py:269-271 back-off: when output has
  no <tool_call> markers, scan the whole completion for raw
  <function=...></function> blocks and recover the call. Content text
  is whatever remains after the function blocks are stripped.
- Strip exactly one leading and one trailing newline from parameter
  values (qwen3coder_tool_parser.py:246-250); interior whitespace
  round-trips verbatim — previous .strip() was too aggressive.
- Fix the no-schema branch of _coerce_arg_value to match vLLM
  (qwen3coder_tool_parser.py:128-137): return the raw text verbatim
  after the case-insensitive null short-circuit instead of attempting
  json.loads.

Qwen3.6 needs no separate parser changes — Qwen36Renderer inherits
parse_response from Qwen35Renderer and shares the same parse_qwen35
plumbing, so the vLLM-parity ladder + structural tolerance applies
to both 3.5 (Python repr literals) and 3.6 (JSON literals) in one
shot.

Qwen35ToolParser in parsers.py (DefaultRenderer + tool_parser=
'qwen3.5' path) is updated to share the new _parse_xml_function_blocks
helper so both code paths produce identical output.

Tests: 7 new structural-parity tests pin the regex tolerance, no-tag
back-off, and whitespace stripping against the vLLM reference shape.
Existing test_no_schema_falls_back_to_json_loads is renamed and
inverted to match the new (vLLM-faithful) behavior.
Comment thread renderers/parsing.py Outdated
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 20, 2026

Approvability

Verdict: Needs human review

Significant refactor of tool-call argument parsing with new type coercion logic, schema handling, and changed parse status determination. The runtime behavior changes and unresolved review comment about status precedence warrant human review.

No code changes detected at af8d6e1. Prior analysis still applies.

You can customize Macroscope's approvability policy. Learn more.

…xception path

Close the four remaining behavioral deltas vs vLLM's
Qwen3CoderToolParser non-streaming extract path:

- Content text is now the raw slice (no .strip()). vLLM
  qwen3coder_tool_parser.py:316-319 explicitly leaves the commented-out
  .rstrip() in place; surrounding whitespace before/after <tool_call>
  round-trips verbatim.
- Backoff content (no <tool_call> markers) is the text up to the first
  <function= position only, matching vLLM line 317-319. Removed
  _strip_function_blocks helper that concatenated leftovers between
  function blocks — different from vLLM.
- Each successfully-parsed tool call is assigned an id in vLLM's
  default format: chatcmpl-tool-<16 hex chars> via
  uuid.uuid4().int & MASK_64_BITS, matching vLLM's make_tool_call_id /
  random_uuid pair. Malformed entries (no function name) intentionally
  stay id-less since vLLM would have dropped them entirely.
- Tool-call extraction is wrapped in try/except: on any unhandled
  error, surface tool_calls=[] and content=full_decoded_text — the
  vLLM qwen3coder_tool_parser.py:327-331 catch-all behavior. Reasoning
  extraction stays outside the wrapper since vLLM doesn't do reasoning
  in this parser.

Tests: 4 new cases pin content-no-strip, backoff content slice,
chatcmpl-tool-<hex> id format, and the exception fallback.
Comment thread renderers/parsing.py Outdated
hallerite and others added 2 commits May 20, 2026 13:43
…byte

vLLM's _convert_param_value (qwen3coder_tool_parser.py:125) runs the
case-insensitive "null" short-circuit BEFORE any type check —
including the string family. That means an XML wire value of
<parameter=x>null</parameter> against a {"type": "string"} schema
collapses to Python None, not the string "null".

Previously this repo carried a deliberate deviation to preserve the
string verbatim across the four XML renderers (Qwen3.5/3.6, GLM-5,
MiniMax-M2.5, Laguna-XS.2). The motivation was a tidier round-trip
contract: "string params always round-trip as strings." But the XML
wire format genuinely cannot distinguish the string "null" from a
JSON null, and matching vLLM is more important than matching our
sibling renderers — downstream consumers (verifier, RL-loss) align
with vLLM's interpretation, not ours.

Test updates:
- test_arg_coercion::test_null_is_preserved_verbatim_for_string_type
  is renamed to test_null_short_circuit_applies_even_for_string_type
  and inverted: a string-typed "null" now returns None.
- test_tool_arg_type_preservation::test_string_arg_preserves_type
  branches on _JSON_FORMAT_MODELS (Qwen3 Hermes, Kimi K2 section JSON)
  vs XML renderers for the string-null case only. JSON renderers
  preserve "null" verbatim because the wire bytes quote strings; XML
  renderers null-coerce. Other string cases (bool/int/array/object)
  still round-trip verbatim everywhere.
Folds in #63's bare-string-for-union-schemas fix and refactors
_coerce_arg_value to use vLLM's current shared helpers
(extract_types_from_schema + coerce_to_schema_type from
vllm/tool_parsers/utils.py, landed in vLLM #43025 on 2026-05-19),
replacing this branch's earlier single-type ladder.

Why: #52 was originally written against pre-refactor vLLM and would
have regressed #63 (anyOf schemas routed to "object" → json.loads
on bare strings flagged INVALID_JSON). vLLM's new shape recursively
flattens anyOf/oneOf/allOf into a type set and walks a priority-
ordered ladder (null > integer > number > boolean > object > array
> string) where string is the always-succeeding terminal — which
absorbs #63's case as a side effect.

Behavior changes relative to the pre-merge branch:
- Union schemas (anyOf/oneOf/allOf) recursively flatten — Union[str,X]
  now accepts bare strings without flagging (the #63 win).
- No top-level "null" short-circuit. Null coercion only fires when
  "null" is in the schema's type set (Optional[X] → anyOf [X, null]
  or type: ["X", "null"]). String-typed "null" stays a string.
- "yes" / non-boolean text for bool-typed param returns raw text +
  INVALID flag (was: False + INVALID).
- Number branch demotes whole floats to int regardless of source
  shape (1e3 → 1, 1.0 → 1) — matches vLLM's val.is_integer() rule.
- No ast.literal_eval anywhere — vLLM doesn't use it. Python-literal
  dicts ('k':1) no longer parse for object params.
- "binary" dropped as a bool alias (not in vLLM's _TYPE_ALIASES).

Renderers-specific deviation kept: used_fallback flag returned
alongside the value, propagated to ToolCallParseStatus.INVALID_JSON
for the verifier / RL-loss signal. vLLM has no such signal.

Tests updated for new behavior; full suite green (1775+99=1874 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8bd06f6. Configure here.

Comment thread renderers/parsing.py Outdated
Comment thread renderers/parsing.py
Three bugbot findings from PR #52:

- HIGH: ``int(val)`` in the number branch of ``_coerce_arg_value``
  was outside the try block, so ``float("nan")`` (raises ValueError
  on ``int()``) and ``float("inf")`` (raises OverflowError) would
  abort the whole parse. Move ``int(val)`` back inside the try and
  widen the catch to include ``OverflowError`` — minor improvement
  over vLLM, which crashes on ``inf``.
- LOW: ``_TOOL_CALL_BLOCK_RE`` was defined at module level but
  never used (carryover from the vLLM regex listing). Removed.
- MEDIUM: document the ``UNCLOSED_BLOCK`` > ``INVALID_JSON``
  status precedence in ``_parse_xml_function_blocks``. Structural
  failure dominates coercion failure when a block is both truncated
  and has uncoercible args; the verifier already discounts unclosed
  blocks, so combining the two would be noise. Intent now in a
  comment so future readers don't flip it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qwen3.5 tool-call parser is stricter than SGLang qwen3_coder on boolean args

1 participant