Skip to content

fix: capture <REQ-IF> opening tag without O(N²) tree mutation#217

Merged
stanislaw merged 1 commit into
strictdoc-project:mainfrom
fNBU:fix/reqif-tag-capture-quadratic
Jun 11, 2026
Merged

fix: capture <REQ-IF> opening tag without O(N²) tree mutation#217
stanislaw merged 1 commit into
strictdoc-project:mainfrom
fNBU:fix/reqif-tag-capture-quadratic

Conversation

@fNBU

@fNBU fNBU commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

ReqIFParser._parse_reqif records the original <REQ-IF …> opening tag (its namespace declarations and attributes) for faithful round-trip unparsing. It did this by deleting every child
of the parsed root and serializing the now-empty root. Removing the CORE-CONTENT child forced libxml2 to reconcile namespaces across the entire detached spec-object subtree — O(N²) in
the number of spec objects, accounting for roughly two-thirds of total parse time on large files (tens of minutes at 100k objects).

Fix

Serialize a shallow copy of the root (same tag, nsmap, and attrib, no children) instead of mutating the parsed tree. lxml has no API to serialize an element's opening tag alone, so a
childless copy is the linear way to isolate it. This is also non-destructive — it removes the prior side effect of emptying the caller's tree.

Behavior

  • Byte-identical opening tag for pretty-printed (real-world) inputs; verified against the Polarion and SDoc round-trip fixtures.
  • For inputs with no whitespace after the opening tag, output normalizes self-closing <REQ-IF .../> to <REQ-IF ...> — the correct form for an opening-tag capture; does not occur in
    pretty-printed files.

Performance

Full parse_from_string is now linear: ~0.17 s at 20k objects and ~0.35 s at 40k, versus the old capture block alone taking ~14 s and ~84 s.

Tests

Unit 49/49 pass; integration 78/79 (the one failure is an environmental missing libtidy, unrelated).

@fNBU fNBU force-pushed the fix/reqif-tag-capture-quadratic branch from 17b9383 to dd2e10a Compare June 9, 2026 21:15
Comment thread reqif/parser.py

@stanislaw stanislaw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a great change, I just left a small comment.

_parse_reqif recorded the original <REQ-IF ...> opening tag by removing
every child of the parsed root and serializing the emptied root. Removing
the CORE-CONTENT child forced libxml2 to reconcile namespaces across the
whole detached spec-object subtree, making this O(N²) in the number of
spec objects (~two-thirds of total parse time on large files).

Serialize a shallow copy of the root (same tag, nsmap, attributes, no
children) instead. This is non-destructive and linear; output is
byte-identical for pretty-printed inputs.
@fNBU fNBU force-pushed the fix/reqif-tag-capture-quadratic branch from dd2e10a to 1afb4f9 Compare June 11, 2026 06:42
@fNBU

fNBU commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

fixed. this is an important fix for us.

@stanislaw stanislaw merged commit 5c0c9a5 into strictdoc-project:main Jun 11, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants