currently pymupdf does not support writing PDF streams
currently i have to buffer the whole output document in RAM
and then i can write the complete document to disk
# pseudo code
import pymupdf
input_doc = pymupdf.open("input.pdf")
output_doc = pymupdf.open() # empty PDF
# buffer to RAM
for page_idx in range(100):
out_page = output_doc.new_page(width=100, height=200)
colorspace = pymupdf.csGRAY
page_image = get_page_image(page_idx) # np.array
rect = input_doc[page_idx].rect
h, w = page_image.shape[:2]
pix = pymupdf.Pixmap(colorspace, w, h, page_image.tobytes(), False)
out_page.insert_image(rect, pixmap=pix)
# write to disk
output_doc.save("output.pdf")
output_doc.close()
problem:
this fails if the output document is bigger than RAM
solution:
stream-write the output document to disk
i would need to pass the output file path to pymupdf.open
(which currently fails if the output file does not exist)
and maybe explicitly enable output-streaming
output_doc = pymupdf.open("output.pdf", write_stream=True)
so then i can remove the output_doc.save call
and the output file is finalized in output_doc.close()
proof of concept:
i have implemented a minimal streaming PDF writer in
rahimnathwani/binarize-pdf#2
as expected, this uses significantly less memory
currently pymupdf does not support writing PDF streams
currently i have to buffer the whole output document in RAM
and then i can write the complete document to disk
problem:
this fails if the output document is bigger than RAM
solution:
stream-write the output document to disk
i would need to pass the output file path to
pymupdf.open(which currently fails if the output file does not exist)
and maybe explicitly enable output-streaming
so then i can remove the
output_doc.savecalland the output file is finalized in
output_doc.close()proof of concept:
i have implemented a minimal streaming PDF writer in
rahimnathwani/binarize-pdf#2
as expected, this uses significantly less memory