Skip to content

add PDF stream writer #4968

@milahu

Description

@milahu

currently pymupdf does not support writing PDF streams

currently i have to buffer the whole output document in RAM
and then i can write the complete document to disk

# pseudo code

import pymupdf

input_doc = pymupdf.open("input.pdf")
output_doc = pymupdf.open()  # empty PDF

# buffer to RAM
for page_idx in range(100):
    out_page = output_doc.new_page(width=100, height=200)
    colorspace = pymupdf.csGRAY
    page_image = get_page_image(page_idx) # np.array
    rect = input_doc[page_idx].rect
    h, w = page_image.shape[:2]
    pix = pymupdf.Pixmap(colorspace, w, h, page_image.tobytes(), False)
    out_page.insert_image(rect, pixmap=pix)

# write to disk
output_doc.save("output.pdf")
output_doc.close()

problem:
this fails if the output document is bigger than RAM

solution:
stream-write the output document to disk

i would need to pass the output file path to pymupdf.open
(which currently fails if the output file does not exist)
and maybe explicitly enable output-streaming

output_doc = pymupdf.open("output.pdf", write_stream=True)

so then i can remove the output_doc.save call
and the output file is finalized in output_doc.close()

proof of concept:
i have implemented a minimal streaming PDF writer in
rahimnathwani/binarize-pdf#2
as expected, this uses significantly less memory

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions