Skip to content

Harden against malicious PDFs: fix 3 unrecovered crashes, raise crypt/scanner coverage#6

Open
fank wants to merge 5 commits into
mainfrom
claude/harden-untrusted-input
Open

Harden against malicious PDFs: fix 3 unrecovered crashes, raise crypt/scanner coverage#6
fank wants to merge 5 commits into
mainfrom
claude/harden-untrusted-input

Conversation

@fank

@fank fank commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Three unrecovered process crashes reachable from a malicious PDF — through Open() or the public content-stream Scanner — plus the test coverage that surfaced them.

Crashes fixed

  1. Hostile /Encrypt /Length → slice panic (internal/crypt). computeRC4Key sliced a 16-byte MD5 digest by /Length ÷ 8 with no bound, so /Length 256 (or a negative value) panicked during Open — before password validation, with no recover anywhere. Reject key lengths outside [1, 16].
  2. Content-stream nesting → stack overflow (contentstream). readArray/readDict recursed with no depth limit; the scanner is a separate parser the object parser's existing cap never covered. Bound nesting at 1000.
  3. Empty inline image (BI ID EI) → slice panic (contentstream). imgEnd = imgStart - 1 produced a reversed src[imgStart:imgStart-1] slice. Clamp to an empty image.

Coverage & cleanup

  • internal/crypt 0% → ~85%: RC4 / AES-128 / AES-256 decrypt round-trips, V2/V4/V5 (incl. the iterated R6 hash) key derivation, and end-to-end RC4 stream decryption through Open.
  • content-stream scanner → ~69%.
  • Removed dead encryption glue (encryptCtx.decryptString plus an unused field, parameter, and import).

Each fix ships a red→green reproduction test. go test -race ./... passes and go vet is clean.

fank added 5 commits June 19, 2026 11:20
computeRC4Key sliced a 16-byte MD5 digest by /Length/8 without a bounds
check. /Length is attacker-controlled, so a value above 128 bits (keyLen
> 16) or a negative /Length sliced out of range and panicked during Open
— before password validation, with no recover anywhere — crashing the
process on a malicious encrypted PDF. Reject key lengths outside
[1, md5.Size].

Reproduced through Open (TestEncryptHostileKeyLengthNoPanic) and at the
unit level (TestNewRejectsHostileKeyLength).
internal/crypt had no test file (0% coverage). Cover the decrypt paths
(RC4, AES-128, AES-256, Identity) with round-trips, malformed-AES
robustness, V2/V4/V5 key derivation including the iterated R6 hash, and
V4 crypt-filter method mapping. Add an end-to-end RC4 stream decryption
through Open to exercise the encrypt glue. Raises internal/crypt to ~85%.
encryptCtx.decryptString was never wired into object resolution, the
password field was never set or read, decryptStream's dict parameter
was unused, and the errors import survived only via a no-op keep-alive.
Drop them; the crypt.Handler API (incl. DecryptString) stays intact.
Two unrecovered crashes reachable from a hostile decoded content stream
through the public Scanner API:

- readArray/readDict recursed with no depth limit, so deeply nested
  arrays or dicts overflowed the goroutine stack. Bound nesting at 1000,
  mirroring the object parser's maxParseDepth.
- An inline image with no data between ID and EI (e.g. "BI ID EI")
  computed imgEnd = imgStart-1 and sliced src[imgStart:imgStart-1],
  panicking. Clamp to an empty image.

Tests: TestDeeplyNestedArrayRejected, TestDeeplyNestedDictRejected,
TestModeratelyNestedArrayResolves, TestInlineImageEmptyNoPanic.
This repo names each test file after the source it tests (parse_test.go,
xref_test.go, scanner_test.go, …). The crypt/scanner tests had been split
by concern, leaving names with no matching source file. Fold them in:

  crypt_decrypt_test.go                    -> crypt_test.go
  internal/crypt/decrypt_test.go           -> internal/crypt/crypt_test.go
  internal/crypt/key_test.go               -> internal/crypt/crypt_test.go
  contentstream/scanner_robustness_test.go -> contentstream/scanner_test.go

Pure file reorganization; no test logic changes.
@fank fank marked this pull request as ready for review June 19, 2026 21:57
@fank fank requested a review from pgundlach June 19, 2026 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant