Security hardening: fix decompression bombs, OOB panic, and overflow in decoders by MagicalTux · Pull Request #88 · KarpelesLab/compcol

MagicalTux · 2026-06-10T02:14:38Z

Parallel security audit of all decoder modules (decoder input treated as attacker-controlled) found 9 issues across 8 modules. Each was fixed in an isolated worktree and validated; all fixes are in disjoint files. Full cargo test --all-features passes.

Fixes

High severity

zstd (sequences.rs, decoder.rs, literals.rs): execute_sequences had no per-block output cap — an RLE-mode block could expand history to multiple GB before any output drained (bypassing LimitedDecoder). Now meters literal + match bytes against Block_Maximum_Size = min(window_size, 128 KiB) and returns Corrupt on overflow. Literals Regenerated_Size cap tightened from 128 MiB to 128 KiB.
bzip2 (decoder.rs): RLE-2 zero-run cap was per-run, not cumulative — a crafted block could grow mtf_indices to hundreds of GB. Added a cumulative bound at level * 100_000 (the real block size).
lzx (decoder.rs): the verbatim/aligned match-copy fast path could take an out-of-bounds window slice (reachable panic) when chunk + distance > window_size. Added a chunk + dist <= win_size guard routing such copies to the correct byte-by-byte path.

Medium severity

rar5 (decoder.rs): decode_distance slot-63 path overflowed u32 (debug panic / release wrap) on attacker-controlled bits. Now accumulates in u64 and rejects out-of-range distances with InvalidDistance. Regression test added.
lzah (mod.rs): uncapped Vec::with_capacity(expected_len) from the out-of-band length → OOM. Capped to 1 MiB, matching the LHA siblings; the decode loop still enforces the true bound.

Low severity

lz5 (mod.rs): up-front reserve(payload_len) of the declared block size (up to 256 MiB) before reading payload. Bounded the floor to 64 KiB; buffers grow incrementally.
arsenic (pipeline.rs): uncapped one-shot output buffer across unbounded blocks. Added a 256 MiB cap, matching sit13.
gzip (mod.rs): interop bug — a member with both FEXTRA and FHCRC mis-parsed (stale aux_idx), feeding the 2 header-CRC bytes into deflate and rejecting a valid stream. Now resets aux_idx on entry to the FHCRC phase. Regression tests added.

Audited clean

Core infra (bits/huffman/io/factory/limit/checksum), deflate/deflate64/zlib decoders, the LZMA family (lzma/lzma2/xz), brotli, the fast-LZ codecs (lz4/snappy/lzo/lzw), ppmd, sit13, rar1/rar2/rar3 + lzham, and the legacy/filter codecs (lha/lzss/lzs/zip_/arc_/rle/packbits/bcj/bcj2/delta) — high-risk spots verified bounded.

🤖 Generated with Claude Code

execute_sequences appended literal/match bytes into an unbounded history Vec with no cap against the per-block maximum decoded size. A malicious Compressed_Block using RLE_Mode FSE tables (e.g. match-length RLE symbol 52 -> ml_base=65539, no state bits consumed) emits ~65 KiB per cheap sequence, so a ~128 KiB input block could expand history to multiple GiB before any output is drained -- bypassing LimitedDecoder, which only meters drained bytes. Per RFC 8478 the decoded output of a single Compressed_Block is bounded by Block_Maximum_Size = min(Window_Size, 128 KiB). execute_sequences now takes that bound and tracks a running per-block output counter over both literal-run bytes and match-length bytes (plus trailing literals), returning Error::Corrupt as soon as a sequence would push the block over the bound. decode_compressed_block computes the bound from the frame's window_size (falling back to 128 KiB when no Window_Descriptor was present). The self-overlap match-copy logic is unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The Raw/RLE and Compressed literals paths validated Regenerated_Size only against MAX_WINDOW_SIZE (128 MiB), but the per-block literal Regenerated_Size is spec-capped at Block_Maximum_Size (<= 128 KiB). literals.resize(regen_size, byte) could thus eagerly allocate ~1 MiB from a few input bytes. Reject sizes above a 128 KiB ceiling (MAX_BLOCK_REGEN_SIZE) before allocating, matching the per-block bound. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ession bomb The RLE-2 zero-run decoder in decode_block_payload only capped a SINGLE zero-run (~8.1M bytes) but never bounded the CUMULATIVE length of mtf_indices. Because zero_run is flushed and reset to 0 after each non-zero symbol, the per-run guard reset too — a malicious bzip2 block (symbol count bounded only by num_selectors*50 ≈ 900_100) could flush a fresh ~8.1M-byte zero-run after every symbol and grow the intermediate mtf_indices buffer to hundreds of GB. Since this buffer is allocated before any output is produced, crate::limit::LimitedDecoder (which only observes output) cannot stop it: an attacker-controlled, small bzip2 input triggers an OOM. Fix: a single bzip2 block decodes to at most the declared block size (level * 100_000, level 1..=9). Bound mtf_indices.len() + zero_run by that size (plus 1024 bytes of slack, matching the prior per-run headroom — tight, not a multiple) on every RUNA/RUNB accumulation, every literal push, and the final flush, returning Error::Corrupt on overflow. Mirrors arsenic's `block.len() + count > block_size` guard. Legitimate streams up to the declared block size still decode (existing round-trip tests, incl. level 9, pass). Addresses HIGH finding: bzip2 decoder RLE-2 cumulative-length decompression-bomb OOM. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… path to fix OOB panic The verbatim/aligned match-copy fast path in HuffPhase::EmittingMatch split the circular window with split_at_mut and indexed the lower half with a[dst_start..dst_start + chunk]. In the wrap case (src > window_pos) the lower half has length src, so the slice is only in bounds when dst_start + chunk <= src, i.e. chunk + dist <= win_size. The two existing guards (src + chunk <= win_size, window_pos + chunk <= win_size) do not imply this. An attacker-controlled in-range match with distance near the window size (e.g. win_size=32768, window_pos=1000, dist~32700, len>=100) passed both guards yet ran the slice off the end -> OOB index -> panic. Add chunk + dist <= win_size to the fast-path condition so wrapping / near-window copies fall through to the existing, correct byte-by-byte copy loop. This is a panic-avoidance correctness guard, not an input rejection: legitimate matches still decode unchanged. Refs security finding (HIGH): src/lzx/decoder.rs EmittingMatch OOB slice. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…low panic In decode_distance, an attacker-controlled rar5 stream selecting DC slot 63 yields base distance 0xC0000001; the 26 high extra bits add up to 0x3FFFFFF0 (-> 0xFFFFFFF1) and the LDC low value (0..=15) then pushes the running u32 total to 0x1_0000_0000. This overflowed the u32 accumulator: an "attempt to add with overflow" panic in debug builds (DoS) and a silent wrap in release builds. Accumulate the distance in u64 so every intermediate is exact, then reject any value above u32::MAX with Error::InvalidDistance before narrowing back to the u32 that emit_match consumes (which still applies the precise > window_size bound). Huffman/slot/extra-bit logic is unchanged; all legitimate in-range distances behave exactly as before. Adds a regression test driving slot 63 with maximal high/low bits, which panics on the unfixed code (dev profile has overflow-checks on) and now returns InvalidDistance. Fixes MEDIUM finding: rar5 decode_distance arithmetic overflow. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

decode_payload reserved Vec capacity for the full attacker-controlled out-of-band uncompressed length (DecoderConfig::with_len), so a crafted entry declaring a multi-GiB length with a tiny payload forced a huge eager allocation. Cap the initial reservation at 1 MiB, mirroring the sibling LHA codecs (static_huff/lzhuf); the while-loop already enforces the true bound so correctness is unchanged. Co-Authored-By: Claude <noreply@anthropic.com>

The block decoder reserved the full attacker-declared block size (up to 256 MiB, validated only against max_block_raw, not bytes remaining), so an ~11-byte input could force a 256 MiB eager allocation. Reserve a bounded 64 KiB floor instead and let extend_from_slice grow the buffer as real bytes arrive (mirrors lz4's frame-decoder reserve cap). The raw and compressed block paths both grow incrementally, so output is unaffected. Validation rejecting payload_len > max_block_raw is unchanged. Co-Authored-By: Claude <noreply@anthropic.com>

decode_stream accumulated the entire stream into a single Vec with no output cap, while the block loop (while end_flag == 0) is unbounded and the final-RLE layer can expand ~51x per block. A small crafted stream could drive output to unbounded size. Add a 256 MiB DEFAULT_OUTPUT_CAP (matching the sibling sit13 decoder) checked inside the emit loop and return Error::Corrupt once exceeded. Co-Authored-By: Claude <noreply@anthropic.com>

The HeaderCrc phase skips the 2-byte FHCRC field using the shared aux_idx counter, but ExtraLen leaves aux_idx == 2 and it was only reset once after the fixed header. When a member set both FEXTRA and FHCRC, HeaderCrc was entered with aux_idx already 2, the skip loop consumed 0 bytes, and the 2 header-CRC bytes were fed into the deflate decoder, producing a spurious decode error on a valid member. Reset aux_idx to 0 on every transition into HeaderCrc (ExtraData/Name/Comment arms; the FixedHeader path already reset it). Adds FHCRC, FEXTRA+FHCRC, and all-fields+FHCRC regression tests. Co-Authored-By: Claude <noreply@anthropic.com>

MagicalTux and others added 9 commits June 10, 2026 11:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security hardening: fix decompression bombs, OOB panic, and overflow in decoders#88

Security hardening: fix decompression bombs, OOB panic, and overflow in decoders#88
MagicalTux wants to merge 9 commits into
masterfrom
security-fixes-2026-06-10

MagicalTux commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MagicalTux commented Jun 10, 2026

Fixes

Audited clean

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant