Skip to content

Improve HTML table handling during markdown conversion #433

@arabold

Description

@arabold

Large documentation tables can make the current HTML-to-Markdown conversion path CPU-bound. We reproduced this with WezTerm's nerdfonts.html?q= page: the page contains a single table with roughly 10,752 rows and 21,504 cells. Plain Turndown and non-GFM conversion complete quickly, but @joplin/turndown-plugin-gfm@1.0.67 does not finish within two minutes for that table.

This issue tracks a broader table-handling improvement separate from the immediate hang mitigation.

Possible direction:

  • Add a DOM preprocessing/cleanup step specifically for tables before markdown conversion.
  • Remove unnecessary styling/formatting wrappers and layout-only divs while retaining inline semantic formatting where useful.
  • Decide per table whether Markdown or minimized HTML is the better representation.
  • Prefer minimized HTML tables when preserving table structure, background colors, column/row styling, code examples, or other complex cell content matters.
  • Prefer Markdown tables for smaller/simple tabular content where GFM output is compact and reliable.
  • Consider row-based splitting or chunking for very large tables so full content can be retained without forcing a single huge GFM table conversion.

The immediate production fix should remain smaller: detect oversized tables and avoid the pathological Joplin GFM conversion path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions