deflate: configurable window (encoder max_distance + decoder window_size)#84
Merged
Conversation
…size Two knobs for interoperating with decoders that use a sliding window smaller than deflate's full 32 KiB, and for decoding on memory-constrained systems. - `EncoderConfig::max_distance` (default `WINDOW_SIZE`, clamped 1..=32768) caps the LZ77 match distance the encoder will emit. The match finder's distance bound becomes `min(max_distance, WINDOW_SIZE, position)`. Set it to 4096 to produce a stream a 4 KiB-window inflater accepts — e.g. qemu/qcow2 decompresses compressed clusters with `inflateInit2(-12)` and fails with Z_DATA_ERROR on any back-reference farther than 4 KiB. - `DecoderConfig::window_size` (default `WINDOW_SIZE`, clamped 1..=32768) sizes the decoder's circular history buffer (now a `Box<[u8]>` of `win_cap` rather than a fixed 32 KiB array) and is the maximum legal back-reference distance: the valid-byte counter is capped at `win_cap`, and a distance beyond it is rejected with `Error::InvalidDistance` (mirroring zlib's small-window Z_DATA_ERROR). This both shrinks decoder memory and lets a caller verify that an encoded stream stays within a given window. Internal deflate `EncoderConfig` constructions (zlib/gzip/factory) use `..Default::default()`. zlib/gzip keep the full 32 KiB window. New tests cover the cap end to end: a `max_distance: 4096` stream decodes under a 4 KiB-window decoder, while the full-window encoding of the same far-repeat data is rejected by that decoder — and still decodes under the full window. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merged
Member
Author
|
Follow-up: addressed the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
qemu/qcow2 decompresses compressed clusters with a 4 KiB deflate window (
inflateInit2(-12)), so any back-reference farther than 4096 bytes makesqemu-imgfail withZ_DATA_ERROR. compcol's deflate encoder was hardwired to the full 32 KiB window. This adds the two knobs needed to target small-window decoders — and to decode on memory-constrained systems.Knobs
deflate::EncoderConfig::max_distance(defaultWINDOW_SIZE, clamped1..=32768)Caps the LZ77 back-reference distance the encoder emits — the match finder's bound becomes
min(max_distance, WINDOW_SIZE, position). Set4096to produce a stream a 4 KiB-window inflater accepts.deflate::DecoderConfig::window_size(defaultWINDOW_SIZE, clamped1..=32768)Sizes the decoder's circular history buffer — now a
Box<[u8]>ofwin_capinstead of a fixed 32 KiB array, so a small window uses proportionally less memory — and is the maximum legal back-reference distance: distances beyond it are rejected withError::InvalidDistance, mirroring zlib's small-windowZ_DATA_ERROR. This doubles as a way to prove an encoded stream stays within a given window.Compatibility
EncoderConfigconstructions (zlib/gzip/factory) updated to..Default::default(); zlib/gzip keep the full window. (Minor: deflate's publicEncoderConfig/DecoderConfiggained a field — construct with..Default::default()if you used a struct literal.)Validation
New tests exercise the pairing end-to-end:
max_distance: 4096round-trips, and suppresses a far (~8 KiB) match (capped output is larger than the full-window encoding).InvalidDistance, and the full-window decoder still reads it.All 38 deflate tests pass; zlib (32) and gzip (37) unaffected; full
--all-featuressuite green; fmt / clippy-D warnings/ docs clean.🤖 Generated with Claude Code