fix(parser): avoid stack overflow on large tag expressions (#817) by Kirylka · Pull Request #820 · NaturalIntelligence/fast-xml-parser

Kirylka · 2026-04-16T17:45:31Z

Problem

tagExpWithClosingIndex (introduced in 5.5.11) accumulates every scanned char code into a chars array and returns the result via String.fromCharCode(...chars). When a tag expression is large enough, the spread exceeds V8's per-call argument budget and surfaces as:

RangeError: Maximum call stack size exceeded
    at fast-xml-parser/lib/fxp.cjs:1:xxxxx
    at st (...)
    at parseXml (...)
    at parse (...)

Shallow 4-frame trace (not recursion-driven). Bisect pins the regression to 5.5.11; every release since (5.5.11 / 5.5.12 / 5.6.0) still reproduces it. Affects real-world XML like 12MB Adobe AE .aepx files and 46MB XHTML ESEF reports (#817).

Fix

Two-phase approach inside tagExpWithClosingIndex:

Single forward scan locates the close-delimiter position and tracks quote boundaries. It only records whether any \t was seen inside a quoted attribute value — no per-char string building.
Fast path (no tab inside a quoted attr): xmlData.substring(i, close).replace(/\t/g, ' '). This is the common case for real-world XML — zero per-char work.
Rare path (tab inside a quoted attr): walk the extracted substring once, preserving tabs inside quotes and normalising tabs outside quotes to spaces. Matches 5.5.10 semantics exactly.

Keeps the 5.5.11 perf improvements (charCodeAt comparisons, precomputed close codes, cached len).

Verification

Correctness

All 310 existing specs pass
Two new regression tests in spec/large_spec.js:
- 200k-char attribute value — fails on unpatched 5.5.11+, passes after
- Tab preserved inside quoted attribute — guards the 5.5.10 semantic
A 24-case byte-equality battery (edge cases: tabs inside/outside/adjacent quotes, single vs double quotes, surrogate pairs, long attrs, PI with tabs, entities, boolean attrs, etc.) — every case produces output byte-identical to 5.5.10
Verified against the 12MB .aepx from Error: RangeError: Maximum call stack size exceeded #817 — output SHA-256 identical to 5.5.10

Performance (Node 22, median over many iterations)

input	pure 5.5.10	this PR	delta
small XML	0.01ms	0.01ms	~0
medium XML (~8K)	0.50ms	0.41ms	−18%
wide attr (~5K chars)	0.05ms	0.03ms	−37%
very wide attr (~200k chars)	2.88ms	1.27ms	−56%
12MB `.aepx` (#817 repro)	324ms	289ms	−11%

Closes #817

…telligence#817) `tagExpWithClosingIndex` (introduced in 5.5.11) accumulated each scanned char code into a `chars` array and returned the result via `String.fromCharCode(...chars)`. For a tag expression large enough, the spread exceeds V8's per-call argument-count limit and manifests as `RangeError: Maximum call stack size exceeded` (with a shallow 4-frame stack), breaking parsing of large real-world XML files (e.g. Adobe AE .aepx, large XHTML reports). Replace the spread with `xmlData.substring(i, index).replace(/\t/g, ' ')`, which preserves the existing semantics (tab -> space, otherwise verbatim up to the closing char) without the per-char accumulation. This is safe because `closingChar` is `>` or `?>`, neither of which collides with `"` / `'` attribute delimiters, so the boundary tracking below still governs when to stop. Adds a regression test using a 200000-char attribute value.

The previous fix used `xmlData.substring(i, index).replace(/\t/g, ' ')`, which unconditionally replaced every tab character in the tag expression — including those inside quoted attribute values. 5.5.10 and 5.5.11 both only converted tabs to spaces when outside quoted attrs. Switch to building the tag expression with `tagExp += xmlData[index]` in the scan loop (same shape as 5.5.10) and explicitly handle `\t` only outside the attribute boundary. Still avoids the `String.fromCharCode(...chars)` spread that caused NaturalIntelligence#817. Adds a regression test for tab preservation inside quoted attr values.

The previous commit restored 5.5.10 tab semantics via `tagExp += xmlData[index]` in the scan loop. That worked correctly but regressed performance by ~18% on a 12MB XML input vs 5.5.10 due to cons-string overhead. Switch to a two-phase approach: 1. Single forward scan records the close-delimiter position and a flag for whether any `\t` was seen inside a quoted attribute value. Nothing is written during this pass. 2. Fast path (no tabs in quotes): `xmlData.substring(i, close).replace(/\t/g, ' ')`. This is the common case for real-world XML and avoids per-char work entirely. 3. Rare path (tabs inside quotes): walk the already-extracted substring once more, preserving tabs inside quotes and converting tabs outside quotes to spaces. Benchmarks (Node 22, median of many iterations): input pure 5.5.10 this PR delta small XML 0.01ms 0.01ms ~0 medium XML (~8K) 0.50ms 0.41ms -18% wide attr (~5K) 0.05ms 0.03ms -37% very wide attr (~200k) 2.88ms 1.27ms -56% 12MB aepx (NaturalIntelligence#817 repro) 324ms 289ms -11% Output is byte-identical to 5.5.10 on every tested input (verified against a 24-case battery + the 12MB .aepx SHA-256).

amitguptagwl · 2026-04-17T04:01:08Z

Thanks for your effort and time. Since the PR contains browser bundle which are not accepted in PR. I've updated desire method. Please check current code on github. This would be released tomorrow.

Kirylka · 2026-04-17T08:25:13Z

Quick heads-up on the commit that landed on master (e174168): it's close to my first pass (substring + bulk replace(/\t/g, ' ')) and does fix the stack overflow, but it drops two things that were in later commits on this branch:

Tab preservation inside quoted attribute values. v5.5.10 preserved \t literally when it appeared inside "..." — only tabs outside quoted attrs were converted. The committed version replaces every tab in the tag expression, so <item a="x\ty"/> now comes back as "x y" instead of "x\ty". Arguably more XML 1.0 §3.3.3 compliant for CDATA attrs, but a silent behavior change for anyone upgrading from 5.5.10.
Regression tests. This failure mode (a perf refactor introducing the spread bomb in the tag-expression reader) only came in during 5.5.11 itself — without tests the same shape of bug could slip back on the next pass.

If useful, the two small specs on this branch cover both cases (the 200k-char tag-expression test guarding #817, and the tab-in-quoted-attr semantic). Happy to open a separate tiny PR with just those, or you can lift them directly.

Either way, thanks for the quick turnaround.

amitguptagwl · 2026-04-17T13:55:51Z

I noticed that if we build an array in tagExpWithClosingIndex like this [tagname, true, booleanAttr, true, attr, val] then it will simplify the logic of boolean attributes and raw attributes methods too. But this will need some time. So I'l reverting my changes to release rest of the changes and then we can take this change.

Kirylka mentioned this pull request Apr 16, 2026

Error: RangeError: Maximum call stack size exceeded #817

Open

2 tasks

Kirylka added 2 commits April 16, 2026 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(parser): avoid stack overflow on large tag expressions (#817)#820

fix(parser): avoid stack overflow on large tag expressions (#817)#820
Kirylka wants to merge 3 commits intoNaturalIntelligence:masterfrom
Kirylka:fix/issue-817-stack-overflow-long-tag

Kirylka commented Apr 16, 2026 •

edited

Loading

Uh oh!

amitguptagwl commented Apr 17, 2026 •

edited

Loading

Uh oh!

Kirylka commented Apr 17, 2026

Uh oh!

amitguptagwl commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Kirylka commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Verification

Correctness

Performance (Node 22, median over many iterations)

Uh oh!

amitguptagwl commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kirylka commented Apr 17, 2026

Uh oh!

amitguptagwl commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kirylka commented Apr 16, 2026 •

edited

Loading

amitguptagwl commented Apr 17, 2026 •

edited

Loading