Skip to content

deflate: configurable window (encoder max_distance + decoder window_size)#84

Merged
MagicalTux merged 1 commit into
masterfrom
deflate-window-config
Jun 3, 2026
Merged

deflate: configurable window (encoder max_distance + decoder window_size)#84
MagicalTux merged 1 commit into
masterfrom
deflate-window-config

Conversation

@MagicalTux

Copy link
Copy Markdown
Member

Motivation

qemu/qcow2 decompresses compressed clusters with a 4 KiB deflate window (inflateInit2(-12)), so any back-reference farther than 4096 bytes makes qemu-img fail with Z_DATA_ERROR. compcol's deflate encoder was hardwired to the full 32 KiB window. This adds the two knobs needed to target small-window decoders — and to decode on memory-constrained systems.

Knobs

deflate::EncoderConfig::max_distance (default WINDOW_SIZE, clamped 1..=32768)
Caps the LZ77 back-reference distance the encoder emits — the match finder's bound becomes min(max_distance, WINDOW_SIZE, position). Set 4096 to produce a stream a 4 KiB-window inflater accepts.

deflate::DecoderConfig::window_size (default WINDOW_SIZE, clamped 1..=32768)
Sizes the decoder's circular history buffer — now a Box<[u8]> of win_cap instead of a fixed 32 KiB array, so a small window uses proportionally less memory — and is the maximum legal back-reference distance: distances beyond it are rejected with Error::InvalidDistance, mirroring zlib's small-window Z_DATA_ERROR. This doubles as a way to prove an encoded stream stays within a given window.

Compatibility

  • Defaults are unchanged (full 32 KiB window), so existing behavior is identical.
  • Internal deflate EncoderConfig constructions (zlib/gzip/factory) updated to ..Default::default(); zlib/gzip keep the full window. (Minor: deflate's public EncoderConfig/DecoderConfig gained a field — construct with ..Default::default() if you used a struct literal.)

Validation

New tests exercise the pairing end-to-end:

  • max_distance: 4096 round-trips, and suppresses a far (~8 KiB) match (capped output is larger than the full-window encoding).
  • A 4 KiB-window decoder reads the capped stream, rejects the full-window encoding of the same data with InvalidDistance, and the full-window decoder still reads it.
    All 38 deflate tests pass; zlib (32) and gzip (37) unaffected; full --all-features suite green; fmt / clippy -D warnings / docs clean.

🤖 Generated with Claude Code

…size

Two knobs for interoperating with decoders that use a sliding window smaller
than deflate's full 32 KiB, and for decoding on memory-constrained systems.

- `EncoderConfig::max_distance` (default `WINDOW_SIZE`, clamped 1..=32768)
  caps the LZ77 match distance the encoder will emit. The match finder's
  distance bound becomes `min(max_distance, WINDOW_SIZE, position)`. Set it to
  4096 to produce a stream a 4 KiB-window inflater accepts — e.g. qemu/qcow2
  decompresses compressed clusters with `inflateInit2(-12)` and fails with
  Z_DATA_ERROR on any back-reference farther than 4 KiB.

- `DecoderConfig::window_size` (default `WINDOW_SIZE`, clamped 1..=32768) sizes
  the decoder's circular history buffer (now a `Box<[u8]>` of `win_cap` rather
  than a fixed 32 KiB array) and is the maximum legal back-reference distance:
  the valid-byte counter is capped at `win_cap`, and a distance beyond it is
  rejected with `Error::InvalidDistance` (mirroring zlib's small-window
  Z_DATA_ERROR). This both shrinks decoder memory and lets a caller verify
  that an encoded stream stays within a given window.

Internal deflate `EncoderConfig` constructions (zlib/gzip/factory) use
`..Default::default()`. zlib/gzip keep the full 32 KiB window. New tests cover
the cap end to end: a `max_distance: 4096` stream decodes under a 4 KiB-window
decoder, while the full-window encoding of the same far-repeat data is
rejected by that decoder — and still decodes under the full window.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@MagicalTux MagicalTux merged commit 41b652a into master Jun 3, 2026
41 checks passed
@MagicalTux MagicalTux mentioned this pull request Jun 3, 2026
@MagicalTux

Copy link
Copy Markdown
Member Author

Follow-up: addressed the cargo-semver-checks constructible_struct_adds_field finding by marking deflate::EncoderConfig and deflate::DecoderConfig #[non_exhaustive] and adding with_* builders (EncoderConfig::default().with_level(9).with_max_distance(4096), DecoderConfig::default().with_window_size(4096)). One breaking bump now; future tuning knobs on these structs are non-breaking. Same-crate callers (zlib/gzip/factory) are unaffected; the external deflate tests use the builder form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant