Openpdf-core rendering integration in openpdf-renderer by andreasrosdal · Pull Request #1566 · LibrePDF/OpenPDF

andreasrosdal · 2026-05-18T09:33:23Z

Summary

Continues the work to use openpdf-core (PdfReader + PdfContentParser) as the rendering engine in openpdf-renderer. The first commit on this branch extended the basic operator subset; the second commit closes the gap on real-world PDF features — XObjects (forms + images), inline-image safety, and a more robust parse loop.

Commit 1 — expand operator coverage

CMYK colors (k, K) and color-space-aware fills/strokes (cs, CS, sc, SC, scn, SCN) for DeviceGray / DeviceRGB / DeviceCMYK.
Clipping (W, W*) with proper save/restore through q/Q.
Line styling (J, j, M, d, i) plumbed into the BasicStroke.
Extended graphics state (gs) honoring CA/ca alpha and LW/ML/LC/LJ.
Text rise (Ts).
Marked content / compatibility (BMC, BDC, EMC, MP, DP, BX, EX) parsed as no-ops so content inside them still renders.

New convenience entry points on OpenPdfCoreRenderer:

renderPage(int, Graphics2D, int, int) — draws directly onto a caller-supplied Graphics2D (Swing, printer, SVG-backed graphics) without allocating a BufferedImage; saves and restores the caller's transform and clip.
renderAllPages(float) — convenience that returns one BufferedImage per page in document order.

Commit 2 — XObject support and inline-image safety

Form XObjects (Do with /Subtype /Form) render recursively, applying the form's own /Matrix and /BBox under the current CTM with full state save/restore so form content can't leak out.
Image XObjects (Do with /Subtype /Image) decode via:
- ImageIO for JPEG (DCTDecode) and JPEG 2000 (JPXDecode, where the runtime supports it).
- A manual raster builder for uncompressed / Flate-decoded 8-bit DeviceGray, DeviceRGB and DeviceCMYK streams. CMYK is approximated to sRGB on the fly since Java2D can't natively draw a CMYK raster.
- Image XObjects honor the current fill alpha (ca from gs) and the CTM, drawing into the standard (0,0)-(1,1) unit square.
Inline images (BI/ID/EI) are pre-stripped from the content stream before PdfContentParser sees them. PdfContentParser has no inline-image handling and the raw image bytes after ID would otherwise derail tokenization for the rest of the page.
The content-stream parse loop now treats parser-level failures (malformed dictionary, unterminated array, ...) as "stop early" rather than aborting the whole renderer, matching how operator-level errors were already handled.

Tests

OpenPdfCorePageRendererOperatorsTest (8 tests) builds synthetic PDFs with PdfContentByte/PdfTemplate and renders them back, verifying:
- CMYK fills, dashed strokes, W-clipped fills, marked-content sequences, text rise.
- JPEG Image XObject — embed a red JPEG, check the rendered page contains red pixels.
- Form XObject — stamp a PdfTemplate with an orange fill, check the form content reaches the rasterizer.
- Inline image safety — hand-rolled stream with a BI/ID/EI block followed by a red rectangle, check the trailing rectangle still renders after the inline image is stripped.
OpenPdfCoreRendererTest (16 tests) covers the new renderPage(int, Graphics2D, int, int) and renderAllPages(float) overloads, including argument validation and Graphics2D state restoration.
Whole openpdf-renderer module test suite: 84 tests, 0 failures, 0 errors.

README's operator table updated to reflect the broader coverage; new code examples for Graphics2D and batch rendering.

My name: Andreas Røsdal

Second pass at using openpdf-core as the rendering engine in openpdf-renderer. Extends the Java2D rasterizer driven by PdfContentParser with the operators most commonly missing on real-world PDFs: - CMYK colors (k, K) and color-space-aware fills/strokes (cs, CS, sc, SC, scn, SCN) for DeviceGray / DeviceRGB / DeviceCMYK. - Clipping (W, W*) with proper save/restore through q/Q. - Line styling (J, j, M, d, i) plumbed into the BasicStroke. - Extended graphics state (gs) honoring CA/ca alpha and LW/ML/LC/LJ. - Text rise (Ts). - Marked content / compatibility operators (BMC, BDC, EMC, MP, DP, BX, EX) parsed as no-ops so content inside them still renders. Adds two new conveniences on OpenPdfCoreRenderer: - renderPage(int, Graphics2D, int, int) draws directly onto a caller- supplied Graphics2D without allocating a BufferedImage, and saves/ restores the caller's transform and clip. - renderAllPages(float) returns one BufferedImage per page. Adds OpenPdfCorePageRendererOperatorsTest that builds synthetic PDFs with PdfContentByte and renders them back to verify CMYK fills, dashed strokes, clipping, marked content and text rise all drive the renderer end-to-end. README updated to reflect the broader operator table.

Completes the openpdf-core-driven Java2D renderer by handling the operators most commonly missing on real-world PDFs: - Do: Form XObjects render recursively, applying the form's own /Matrix and /BBox under the current CTM with state save/restore. Image XObjects decode via: * ImageIO for DCTDecode (JPEG) and JPXDecode (JPEG 2000, when supported by the runtime), * a manual raster builder for uncompressed / Flate-decoded 8-bit DeviceGray, DeviceRGB and DeviceCMYK streams (CMYK is approximated to sRGB on the fly, since Java2D can't natively draw a CMYK raster). Image XObjects honor the current fill alpha (ca from ExtGState) and the CTM, drawing into the standard (0,0)-(1,1) unit square. - Inline images (BI/ID/EI) are now pre-stripped from the content stream before PdfContentParser sees them; the parser had no inline-image handling and the raw image bytes after ID would otherwise derail tokenization for the rest of the page. - The content-stream parse loop now treats parser-level failures (malformed dictionaries, unterminated arrays) as "stop early" rather than aborting the whole renderer, matching how operator- level errors were already handled. Tests added to OpenPdfCorePageRendererOperatorsTest: - rendersJpegImageXObject builds a red JPEG, embeds it via PdfContentByte.addImage, and checks the page contains red pixels. - rendersFormXObjectViaNestedContentStream stamps a PdfTemplate with an orange fill and checks the form's content reaches the rasterizer. - inlineImagesDoNotBreakPageRendering writes a hand-rolled stream with a BI/ID/EI block followed by a red rectangle and checks the trailing rectangle still renders. README updated; module test suite: 84 tests, 0 failures.

Addresses the checkstyle 'single-line Javadoc comment should be multi-line' rule on the new openpdf-core renderer code. Affects ten one-line Javadocs across OpenPdfCorePageRenderer and one in OpenPdfCoreRenderer; behavior unchanged.

codacy-production · 2026-05-18T09:39:30Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 414 complexity · 12 duplication

Metric Results

Complexity 414

Duplication 12

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

Splits the two over-branchy helpers Codacy flagged into smaller focused methods, and stops reassigning a method parameter: - applyExtGState(String) was a flat list of seven null checks driving an NPath of 2048. Split into resolveExtGStateDict, applyExtGStateAlpha and applyExtGStateLineStyle. - imageComponents(PdfObject) was a chain of PdfName.equals checks on freshly-allocated PdfNames (NPath 3136). Now uses static Set<PdfName> lookups (DEVICE_GRAY_NAMES / DEVICE_RGB_NAMES / DEVICE_CMYK_NAMES) with named PdfName constants, split across componentsForNamedColorSpace, componentsForArrayColorSpace and iccBasedComponents. - imageComponents no longer reassigns its csObj parameter; uses a local `direct` reference instead. Also wraps the long XObject row in openpdf-renderer/README.md that was exceeding the 120-column limit (was 207). No behavior change; module test suite still 84/84 green.

Biggest correctness gap on real-world PDFs has been text rendering: mapFont() picked a generic Java2D family (Serif/Sans/Mono) from the PostScript font name, so PDFs that embedded their own subsetted fonts drew with the wrong glyph shapes (and missed glyphs whenever the name heuristic chose a family that didn't cover the Unicode chars). This commit closes that gap for the dominant case (embedded TrueType / FontFile2): - mapFont() now first calls embeddedFontFor(...), which pulls the CMapAwareDocumentFont's FontDescriptor via openpdf-core, finds the embedded font program stream (FontFile2 / FontFile3 / FontFile in that preference order), and loads it with Font.createFont. The resulting AWT Font is cached by FontDescriptor identity so the same font program isn't re-parsed for every Tj/TJ call. - When no font program is embedded, or parsing fails, falls back to the previous name-heuristic path (now mapFontByName(...)). - Failures are cached (as a null Font) so we don't retry every glyph. Test: - rendersTextUsingEmbeddedTrueTypeFont embeds LiberationSans-Regular (shipped with openpdf-core for font-fallback) in a freshly built PDF, renders the page back and verifies dark pixels appear in the text region. The embedded program is required: no name-based AWT family would match "LiberationSans". README's "Status" section updated and a candid "Honest limitations & roadmap" subsection added. It calls out the remaining gaps in priority order (Type 1 / CFF fonts, Type 3 fonts, ICC color management, patterns and shadings, inline images, soft masks, indexed/Separation/DeviceN, encryption) so future contributors know which gap to grab next. Module test suite: 85 tests, 0 failures.

Codacy markdownlint flagged the bullet list under "XObject coverage:" for missing a leading blank line (lists should be surrounded by blank lines). Single-line fix.

Implements the inline-image roadmap item: instead of pre-stripping inline image blocks, the preprocessor now promotes each one into a synthetic Image XObject and substitutes a `/__inline_image__N Do` invocation into the content stream. The rest of the renderer treats it exactly like a regular Image XObject and reuses the existing buildGrayImage / buildRgbImage / buildCmykImage / ImageIO decode paths. Two framing strategies are used so the parser doesn't get confused by binary data: - For DCT / DCTDecode / JPXDecode filters, find the JPEG end-of-image marker (FFD9) instead of scanning for "EI" bounded by whitespace, since JPEG payloads routinely contain byte sequences that look like EI by accident. - For other filters (including no filter and FlateDecode), keep the whitespace-bounded EI heuristic but stop trimming "trailing whitespace" greedily -- image bytes can legitimately be 0x00 or 0x0A and the spec guarantees exactly one whitespace byte before EI. Abbreviated dict keys (/W, /H, /BPC, /CS, /F) and full names (/Width, /Height, ...) are both accepted; abbreviated colorspace values (/G, /RGB, /CMYK) and full names map to component counts. Tests: - inlineImageRendersAtCtmLocation builds a 2x2 DeviceGray inline image with a [black, white; white, black] checker, scales it 120x via a cm, and asserts the rendered page contains dark pixels in the right region. - jpegInlineImageDecodes uses PdfContentByte.addImage(image, ..., true) to embed a green JPEG as an inline image, then asserts the rendered page contains green pixels. README's status section now says inline images render, and the limitations list no longer mentions them. Also addresses Codacy's "unnecessary fully qualified name" warning on java.util.List / java.util.Set usage. The class now imports List, Set, Arrays, ByteArrayOutputStream, StandardCharsets and Rectangle2D directly instead of inlining the FQNs; 7 call sites simplified. Module test suite: 86 tests, 0 failures.

Highest-ROI item left on the renderer roadmap: every PNG-to-PDF conversion produces images with `[/Indexed /DeviceRGB hival lookup]` color spaces, and the renderer was silently skipping them (decodeRawRaster falls through to null for non-Device colorspaces). This commit adds the decode path. - decodeImage now recognizes `[/Indexed base hival lookup]` (with CS_INDEXED constant) and routes to a new decodeIndexedImage. - decodeIndexedImage reads 8-bit indices from the (already Flate-decoded) stream, expands each pixel through the lookup table into the base color space's component bytes, then reuses the existing buildGrayImage / buildRgbImage / buildCmykImage helpers. The base color space's component count is determined via the existing imageComponents(). - readIndexedLookup handles both forms the spec allows: a PdfString containing the palette bytes, or a PRStream whose decoded content is the palette. - Sub-byte bit depths (1/2/4-bit indices) are explicitly rejected for now -- 8-bit is the dominant case for PNG-derived images. Test: - rendersIndexedColorImageXObject builds a 32x32 BufferedImage with an IndexColorModel (top half = magenta, bottom = cyan), embeds it via Image.getInstance(BufferedImage), and asserts both palette colors appear in the rendered page. openpdf-core's Image.getInstance preserves the IndexColorModel as `[/Indexed /DeviceRGB ...]`, so this exercises the new decode path end-to-end. README updated: Indexed moved from "limitations" to the supported Image XObject formats; only sub-byte-packed indexed images remain called out as unsupported. Module test suite: 87 tests, 0 failures.

Codacy flagged decodeIndexedImage with NPath 385 (threshold 200). Splits the method along its natural seams without changing behavior: - decodeIndexedImage now just wraps the try/catch around decodeIndexedImageOrThrow. - decodeIndexedImageOrThrow handles validation + orchestration. - readBitsPerComponent extracts the /BitsPerComponent read. - expandIndexedPalette is the per-pixel arraycopy loop. - buildImageForBaseComponents is the switch on component count. No behavior change; module test suite still 87/87 green.

PDFs that draw tables (PdfPTable, hand-rolled re/m/l/S grids, ...) lean hard on three pieces of stroke handling that this renderer was getting wrong or skipping: - Zero-width hairline strokes (PDF §8.4.3.2). `w 0` means "the thinnest line the device can render", i.e. one device pixel. The previous `Math.max(lineWidth, 0.001f)` collapsed those hairlines to invisibility once the page CTM scaled them. Now strokePath() computes an effective width of `1 / max(|sx|, |sy|)` from the current transform so a `0 w` stroke renders as a one-device-pixel line at any DPI. - ExtGState line styling beyond LW/ML/LC/LJ. The dash array `/D` and the stroke-adjust flag `/SA` are now read out of gs dictionaries; `/D` feeds the existing dash-pattern path, `/SA` is tracked through q/Q. - Crisp axis-aligned borders. KEY_STROKE_CONTROL is now set to VALUE_STROKE_NORMALIZE so 0.5pt borders snap to integer device pixels instead of smearing into two rows of antialiased grey. Adds two regression tests: a full PdfPTable render (background fills, red 2pt header border, body-row text) and a `0 w` hairline render that asserts the stroke is actually visible after CTM scaling. https://claude.ai/code/session_01Bobvbg8Ccp2g9S5DRFsnNb

The existing PdfPTable test only exercised single-word cell values ("Col A", "r0c0"). This adds a regression test that pushes harder on the text-in-table path: multi-line wrapped descriptions, a Phrase composed of multiple Chunks with different fonts and colors (regular, bold, italic, RED), varied horizontal alignments, a colored colspan cell with vertical centering, and a larger header font. The four assertions cover the parts that are easy to silently break: - white-on-blue header glyphs (header row text under cell background), - a red Chunk inside an otherwise-black Phrase (per-Chunk fill color), - a blue colspan-cell Phrase (text under multi-column layout), - a multi-line wrapped cell producing several distinct glyph rows. https://claude.ai/code/session_01Bobvbg8Ccp2g9S5DRFsnNb

sonarqubecloud · 2026-05-18T12:38:01Z

Quality Gate passed

Issues
32 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

claude added 3 commits May 18, 2026 09:07

Expand single-line Javadocs to multi-line

029c7da

Addresses the checkstyle 'single-line Javadoc comment should be multi-line' rule on the new openpdf-core renderer code. Affects ten one-line Javadocs across OpenPdfCorePageRenderer and one in OpenPdfCoreRenderer; behavior unchanged.

claude added 3 commits May 18, 2026 09:43

Add blank line before XObject-coverage list in README

2db7c3f

Codacy markdownlint flagged the bullet list under "XObject coverage:" for missing a leading blank line (lists should be surrounded by blank lines). Single-line fix.

andreasrosdal changed the title ~~Complete openpdf-core rendering integration in openpdf-renderer~~ Openpdf-core rendering integration in openpdf-renderer May 18, 2026

claude added 5 commits May 18, 2026 10:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Openpdf-core rendering integration in openpdf-renderer#1566

Openpdf-core rendering integration in openpdf-renderer#1566
andreasrosdal wants to merge 11 commits into
LibrePDF:masterfrom
andreasrosdal:claude/openpdf-core-integration-Fc9rv

andreasrosdal commented May 18, 2026 •

edited

Loading

Uh oh!

codacy-production Bot commented May 18, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andreasrosdal commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commit 1 — expand operator coverage

Commit 2 — XObject support and inline-image safety

Tests

Uh oh!

codacy-production Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

sonarqubecloud Bot commented May 18, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andreasrosdal commented May 18, 2026 •

edited

Loading

codacy-production Bot commented May 18, 2026 •

edited

Loading