Security hardening: fix decompression bombs, OOB panic, and overflow in decoders#88
Open
MagicalTux wants to merge 9 commits into
Open
Security hardening: fix decompression bombs, OOB panic, and overflow in decoders#88MagicalTux wants to merge 9 commits into
MagicalTux wants to merge 9 commits into
Conversation
execute_sequences appended literal/match bytes into an unbounded history Vec with no cap against the per-block maximum decoded size. A malicious Compressed_Block using RLE_Mode FSE tables (e.g. match-length RLE symbol 52 -> ml_base=65539, no state bits consumed) emits ~65 KiB per cheap sequence, so a ~128 KiB input block could expand history to multiple GiB before any output is drained -- bypassing LimitedDecoder, which only meters drained bytes. Per RFC 8478 the decoded output of a single Compressed_Block is bounded by Block_Maximum_Size = min(Window_Size, 128 KiB). execute_sequences now takes that bound and tracks a running per-block output counter over both literal-run bytes and match-length bytes (plus trailing literals), returning Error::Corrupt as soon as a sequence would push the block over the bound. decode_compressed_block computes the bound from the frame's window_size (falling back to 128 KiB when no Window_Descriptor was present). The self-overlap match-copy logic is unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The Raw/RLE and Compressed literals paths validated Regenerated_Size only against MAX_WINDOW_SIZE (128 MiB), but the per-block literal Regenerated_Size is spec-capped at Block_Maximum_Size (<= 128 KiB). literals.resize(regen_size, byte) could thus eagerly allocate ~1 MiB from a few input bytes. Reject sizes above a 128 KiB ceiling (MAX_BLOCK_REGEN_SIZE) before allocating, matching the per-block bound. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ession bomb The RLE-2 zero-run decoder in decode_block_payload only capped a SINGLE zero-run (~8.1M bytes) but never bounded the CUMULATIVE length of mtf_indices. Because zero_run is flushed and reset to 0 after each non-zero symbol, the per-run guard reset too — a malicious bzip2 block (symbol count bounded only by num_selectors*50 ≈ 900_100) could flush a fresh ~8.1M-byte zero-run after every symbol and grow the intermediate mtf_indices buffer to hundreds of GB. Since this buffer is allocated before any output is produced, crate::limit::LimitedDecoder (which only observes output) cannot stop it: an attacker-controlled, small bzip2 input triggers an OOM. Fix: a single bzip2 block decodes to at most the declared block size (level * 100_000, level 1..=9). Bound mtf_indices.len() + zero_run by that size (plus 1024 bytes of slack, matching the prior per-run headroom — tight, not a multiple) on every RUNA/RUNB accumulation, every literal push, and the final flush, returning Error::Corrupt on overflow. Mirrors arsenic's `block.len() + count > block_size` guard. Legitimate streams up to the declared block size still decode (existing round-trip tests, incl. level 9, pass). Addresses HIGH finding: bzip2 decoder RLE-2 cumulative-length decompression-bomb OOM. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… path to fix OOB panic The verbatim/aligned match-copy fast path in HuffPhase::EmittingMatch split the circular window with split_at_mut and indexed the lower half with a[dst_start..dst_start + chunk]. In the wrap case (src > window_pos) the lower half has length src, so the slice is only in bounds when dst_start + chunk <= src, i.e. chunk + dist <= win_size. The two existing guards (src + chunk <= win_size, window_pos + chunk <= win_size) do not imply this. An attacker-controlled in-range match with distance near the window size (e.g. win_size=32768, window_pos=1000, dist~32700, len>=100) passed both guards yet ran the slice off the end -> OOB index -> panic. Add chunk + dist <= win_size to the fast-path condition so wrapping / near-window copies fall through to the existing, correct byte-by-byte copy loop. This is a panic-avoidance correctness guard, not an input rejection: legitimate matches still decode unchanged. Refs security finding (HIGH): src/lzx/decoder.rs EmittingMatch OOB slice. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…low panic In decode_distance, an attacker-controlled rar5 stream selecting DC slot 63 yields base distance 0xC0000001; the 26 high extra bits add up to 0x3FFFFFF0 (-> 0xFFFFFFF1) and the LDC low value (0..=15) then pushes the running u32 total to 0x1_0000_0000. This overflowed the u32 accumulator: an "attempt to add with overflow" panic in debug builds (DoS) and a silent wrap in release builds. Accumulate the distance in u64 so every intermediate is exact, then reject any value above u32::MAX with Error::InvalidDistance before narrowing back to the u32 that emit_match consumes (which still applies the precise > window_size bound). Huffman/slot/extra-bit logic is unchanged; all legitimate in-range distances behave exactly as before. Adds a regression test driving slot 63 with maximal high/low bits, which panics on the unfixed code (dev profile has overflow-checks on) and now returns InvalidDistance. Fixes MEDIUM finding: rar5 decode_distance arithmetic overflow. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
decode_payload reserved Vec capacity for the full attacker-controlled out-of-band uncompressed length (DecoderConfig::with_len), so a crafted entry declaring a multi-GiB length with a tiny payload forced a huge eager allocation. Cap the initial reservation at 1 MiB, mirroring the sibling LHA codecs (static_huff/lzhuf); the while-loop already enforces the true bound so correctness is unchanged. Co-Authored-By: Claude <noreply@anthropic.com>
The block decoder reserved the full attacker-declared block size (up to 256 MiB, validated only against max_block_raw, not bytes remaining), so an ~11-byte input could force a 256 MiB eager allocation. Reserve a bounded 64 KiB floor instead and let extend_from_slice grow the buffer as real bytes arrive (mirrors lz4's frame-decoder reserve cap). The raw and compressed block paths both grow incrementally, so output is unaffected. Validation rejecting payload_len > max_block_raw is unchanged. Co-Authored-By: Claude <noreply@anthropic.com>
decode_stream accumulated the entire stream into a single Vec with no output cap, while the block loop (while end_flag == 0) is unbounded and the final-RLE layer can expand ~51x per block. A small crafted stream could drive output to unbounded size. Add a 256 MiB DEFAULT_OUTPUT_CAP (matching the sibling sit13 decoder) checked inside the emit loop and return Error::Corrupt once exceeded. Co-Authored-By: Claude <noreply@anthropic.com>
The HeaderCrc phase skips the 2-byte FHCRC field using the shared aux_idx counter, but ExtraLen leaves aux_idx == 2 and it was only reset once after the fixed header. When a member set both FEXTRA and FHCRC, HeaderCrc was entered with aux_idx already 2, the skip loop consumed 0 bytes, and the 2 header-CRC bytes were fed into the deflate decoder, producing a spurious decode error on a valid member. Reset aux_idx to 0 on every transition into HeaderCrc (ExtraData/Name/Comment arms; the FixedHeader path already reset it). Adds FHCRC, FEXTRA+FHCRC, and all-fields+FHCRC regression tests. Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Parallel security audit of all decoder modules (decoder input treated as attacker-controlled) found 9 issues across 8 modules. Each was fixed in an isolated worktree and validated; all fixes are in disjoint files. Full
cargo test --all-featurespasses.Fixes
High severity
sequences.rs,decoder.rs,literals.rs):execute_sequenceshad no per-block output cap — an RLE-mode block could expandhistoryto multiple GB before any output drained (bypassingLimitedDecoder). Now meters literal + match bytes againstBlock_Maximum_Size = min(window_size, 128 KiB)and returnsCorrupton overflow. LiteralsRegenerated_Sizecap tightened from 128 MiB to 128 KiB.decoder.rs): RLE-2 zero-run cap was per-run, not cumulative — a crafted block could growmtf_indicesto hundreds of GB. Added a cumulative bound atlevel * 100_000(the real block size).decoder.rs): the verbatim/aligned match-copy fast path could take an out-of-bounds window slice (reachable panic) whenchunk + distance > window_size. Added achunk + dist <= win_sizeguard routing such copies to the correct byte-by-byte path.Medium severity
decoder.rs):decode_distanceslot-63 path overflowedu32(debug panic / release wrap) on attacker-controlled bits. Now accumulates inu64and rejects out-of-range distances withInvalidDistance. Regression test added.mod.rs): uncappedVec::with_capacity(expected_len)from the out-of-band length → OOM. Capped to 1 MiB, matching the LHA siblings; the decode loop still enforces the true bound.Low severity
mod.rs): up-frontreserve(payload_len)of the declared block size (up to 256 MiB) before reading payload. Bounded the floor to 64 KiB; buffers grow incrementally.pipeline.rs): uncapped one-shot output buffer across unbounded blocks. Added a 256 MiB cap, matching sit13.mod.rs): interop bug — a member with both FEXTRA and FHCRC mis-parsed (staleaux_idx), feeding the 2 header-CRC bytes into deflate and rejecting a valid stream. Now resetsaux_idxon entry to the FHCRC phase. Regression tests added.Audited clean
Core infra (bits/huffman/io/factory/limit/checksum), deflate/deflate64/zlib decoders, the LZMA family (lzma/lzma2/xz), brotli, the fast-LZ codecs (lz4/snappy/lzo/lzw), ppmd, sit13, rar1/rar2/rar3 + lzham, and the legacy/filter codecs (lha/lzss/lzs/zip_/arc_/rle/packbits/bcj/bcj2/delta) — high-risk spots verified bounded.
🤖 Generated with Claude Code