Skip to content
This repository was archived by the owner on May 1, 2026. It is now read-only.

core: wipe per-thread CSPRNG state at thread exit#11

Merged
justinjoy merged 3 commits into
mainfrom
feature/issue-4-thread-exit-wipe
Apr 30, 2026
Merged

core: wipe per-thread CSPRNG state at thread exit#11
justinjoy merged 3 commits into
mainfrom
feature/issue-4-thread-exit-wipe

Conversation

@justinjoy

Copy link
Copy Markdown
Contributor

Closes #4.

Summary

The per-thread CSPRNG state in libksuid/rand_tls.c lives in _Thread_local storage. When a thread exits, the OS reclaims the TLS block eventually, but until then the 64-byte ChaCha20 state and the 64-byte keystream window remain in process memory containing live cryptographic material.

This PR adds an automatic thread-exit wipe on platforms with a thread-exit hook (glibc 2.18+ via __cxa_thread_atexit_impl, MUSL >= 1.2.0, libc++abi on macOS, FLS on Windows) and documents the residue policy in the public header for platforms without one (Alpine MUSL <1.2.0, bionic, uClibc).

Series — three atomic commits

Commit Purpose
b199d47 feat: ksuid_random_thread_state_wipe internal hook + re-entry guard + public residue-policy doc + DSE gate floor 4 → 5
dd8e24a core: Per-platform automatic registration (__cxa_thread_atexit_impl on POSIX, FlsAlloc on Windows) + meson cc.links() probe + summary() line
bf191b1 test: Sentinel + atomic-counter regression test that drives the wipe path deterministically and asserts the destructor fires on thread exit

Pipeline that ran

Per the global GitHub-issue resolution workflow rule:

  1. Architect study — 3-commit split, __cxa_thread_atexit_impl + Windows FLS as the two automatic-wipe paths, documented residue elsewhere, sentinel-byte test design with KSUID_TESTING-gated hooks.
  2. Critic study — 10-item risk register: symbol privacy (R1), constructor-order race (R2), dso_handle (R3), Windows FLS NTAPI (R4), MUSL <1.2.0 silent residue (R5), macOS detection (R6), DllMain trap (R7), re-entrancy (R8), DSE-resistant test (R9), CI matrix gaps (R10).
  3. Synthesize — adopted 3-commit split with R1, R2, R3, R4, R5, R7, R8, R9 mitigations folded in. R6 + R10 (Alpine MUSL CI lane) tracked as low-priority follow-ups.
  4. Implementer round 1 — committed.
  5. Reviewer round 1 — PASS (16/16 contractual items, 13/13 tests, clang-tidy 0 findings, gst-indent clean, meson dist round-trip clean).
  6. Architect meta round 1 — SIGN-OFF.
  7. Critic meta round 1 — SIGN-OFF (all blocking risks addressed; remaining gaps are CI-matrix-observable, not pre-merge blockers).

What gates this PR

  • gst-indent + clang-tidy 22 (lint phase)
  • Build/test on Ubuntu GCC + Clang, macOS Clang, Windows MSVC
  • DESTDIR install verification
  • ASan + UBSan on Linux + macOS
  • meson dist round-trip on Ubuntu
  • wipe-fallback job (KSUID_FORCE_VOLATILE_FALLBACK=1 build, asserts no explicit_bzero@plt leak)
  • Updated: auto-build disasm gate floor moved from 4 to 5 (one new wipe call site for ksuid_random_thread_state_wipe)

Side benefits

Test plan

  • lint phase green
  • build matrix green on all four OS+compiler lanes (especially Windows MSVC for the FLS path and macOS Clang for the libc++abi __cxa_thread_atexit resolution)
  • meson summary reports both wipe backend AND thread-exit wipe lines
  • DESTDIR install verification still green (no public-API delta; the wipe hook is private)
  • sanitizers green
  • meson dist round-trip green
  • wipe-fallback job still green (the new destructor wipe routes through ksuid_explicit_bzero like every other site, so the no-explicit_bzero@plt invariant holds)
  • auto-build disasm gate green at the new floor of 5

Out of scope (follow-ups)

  • Alpine MUSL CI lane to witness the documented-residue path. The summary line emits the right backend on every platform, but no current CI lane builds against MUSL <1.2.0 to confirm the fallback branch compiles cleanly. Both meta-reviewers flagged this as a low-priority follow-up rather than a blocker.
  • The macOS lane will be the first run that resolves __cxa_thread_atexit_impl from libc++abi without an explicit -lc++abi link line. If CI fails there, the fix would be to add cc.find_library('c++abi') in meson.build.

Closes #4 (commit 1 of 3 in the series).

Lands the wipe entry point + the public-header residue policy that
documents what callers can rely on. Commit 2 adds the per-platform
automatic registration (__cxa_thread_atexit_impl on glibc / libc++abi
/ MUSL >= 1.2.0, FlsAlloc on Windows). Commit 3 adds a runtime
regression test that drives the hook and asserts the TLS state was
zeroed.

Surface added (private):

  libksuid/rand.h
    - ksuid_random_thread_state_wipe() declared next to the existing
      ksuid_random_force_reseed test hook.

  libksuid/rand_tls.c
    - ksuid_random_thread_state_wipe() implemented: wipes the entire
      ksuid_tls_rng_t via ksuid_explicit_bzero, including the seeded
      flag so the next draw goes through the full reseed path.
    - _Thread_local bool ksuid_tls_in_destructor_ re-entry guard
      (Critic R8): if the wipe ever calls back into ksuid_random_bytes
      -- e.g. a future debug log line in this file -- the guarded
      ksuid_random_bytes returns -1 instead of reseeding into a slot
      that is mid-teardown. The guard is held only across the
      ksuid_explicit_bzero call, which is tiny and cannot itself
      re-enter.
    - The TODO(#4) banner from issue #2 commit 2 is replaced with a
      "Thread-exit residue policy" comment block that summarises the
      situation across platforms and points at the commit 2 registration.

Surface added (public docs only, no ABI delta):

  libksuid/ksuid.h
    - "Thread-exit residue" paragraph above ksuid_new / ksuid_set_rand
      stating the contract: glibc / libc++abi / MUSL>=1.2.0 / Windows
      FLS get automatic wipe at thread exit; other platforms rely on
      the bounded reseed cadence (1 MiB / 1 hour / fork) to keep the
      residue window small.

CI gate update:

  .github/workflows/ci-pr.yml
    - The auto-build disasm grep gate's floor moves from 4 to 5. The
      five surviving wipe call sites are: kn-on-RNG-failure, kn-after-
      seed-copy, consumed-keystream-in-loop (all rand_tls.c), x[16]
      in chacha20.c, and the new
      ksuid_random_thread_state_wipe -> ksuid_explicit_bzero call.
      Observed locally on glibc 2.43 / GCC 15.2.1: 7 surviving calls
      (the static-inline shim is partially inlined, partially kept
      out-of-line, so each source-level call shows up as one or
      both forms).

Verified locally:
  - 13/13 tests pass
  - clang-tidy 22 reports zero findings
  - gst-indent leaves the working tree unchanged
  - Auto-build: 7 surviving wipe calls (>= 5 floor, GATE PASS)
  - Both KSUID_FORCE_VOLATILE_FALLBACK and default builds still
    pass test_wipe.

The wipe is reachable only through the test harness in this commit;
commit 2 wires the platform-specific automatic registration.
…s FLS

Closes #4 (commit 2 of 3 in the series).

Wires the wipe entry point that landed in commit 1 to fire
automatically when the owning thread exits. Two platform paths,
gated by meson cc.links() / host_machine.system() probes:

  1. KSUID_HAVE_CXA_THREAD_ATEXIT_IMPL
     glibc 2.18+, MUSL 1.2.0+, libc++abi (macOS, FreeBSD modern).
     Uses __cxa_thread_atexit_impl(callback, arg, dso_handle) with
     &__dso_handle as the third argument so a dlclose(libksuid.so)
     cleanly tears down its registrations. Probed via cc.links()
     against an explicit prototype (the symbol has no public
     declaration, so cc.has_function would lie either way).

  2. KSUID_HAVE_FLS  (Windows / Cygwin)
     FlsAlloc + InitOnceExecuteOnce. Slot value is a non-NULL
     "this thread participated" sentinel; the real state stays in
     _Thread_local storage and is reachable from the same thread
     during FLS teardown (before the runtime tears down TLS).
     Callback uses NTAPI calling convention (Critic R4) to avoid
     stack corruption on x86_32 MSVC.

  3. else  (uClibc, bionic, MUSL < 1.2.0, ...)
     Falls through to the documented residue policy from commit 1.
     ksuid_random_force_reseed remains the manual mitigation
     callers must use before joining a worker thread.

Critic risk register addressed:

  R1 cc.links() probe with explicit prototype, not cc.has_function
     -- the symbol is private so we cannot trust header-based
     detection.
  R2 ksuid_tls_register_thread_exit() is called from inside
     ksuid_tls_rng_seed BEFORE r->seeded = true, so a thread that
     exits between seed and registration cannot leave a half-wired
     state. The destructor itself is idempotent and bails on
     unseeded state because ksuid_explicit_bzero on already-zeroed
     memory is a no-op.
  R3 dso_handle: extern void *__dso_handle is declared inside
     rand_tls.c and passed by address.
  R4 Windows FLS callback uses VOID NTAPI signature.
  R5 MUSL < 1.2.0 / bionic / uClibc: meson summary line emits
     "thread-exit wipe: documented residue (no automatic wipe)" so
     CI logs make the platform's lifecycle behaviour auditable.
  R7 FlsAlloc, NOT DllMain -- works for both static and DLL link
     modes because the FLS slot is owned by the process.
  R8 Re-entry guard already lives on the wipe entry point itself
     (commit 1).

Surface added: none (purely internal). meson.build gains a second
summary line ("thread-exit wipe: ...") and one cc.links() probe.

Verified locally on Linux glibc 2.43 / GCC 15.2.1:
  - meson summary reports "thread-exit wipe: __cxa_thread_atexit_impl"
  - 13/13 tests pass
  - clang-tidy 22 reports zero findings
  - gst-indent leaves the working tree unchanged
  - objdump shows 1 call to __cxa_thread_atexit_impl@plt and 7
    calls to {explicit_bzero, ksuid_explicit_bzero} (>= 5 floor)
  - KSUID_FORCE_VOLATILE_FALLBACK build still passes test_wipe.

Commit 3 adds a runtime regression test that drives the wipe via
a KSUID_TESTING-gated test hook and asserts the state was zeroed.
Closes #4 (commit 3 of 3 in the series).

Adds a runtime regression test that drives the thread-exit wipe path
deterministically via three KSUID_TESTING-gated test hooks and an
always-defined atomic counter. The library always defines the
counter and the helpers; rand.h gates the prototypes behind
KSUID_TESTING so production callers cannot reach them.

Hooks added (libksuid/rand.h, KSUID_TESTING-gated extern):

  ksuid_thread_exit_wipes_observed
    _Atomic int incremented inside ksuid_random_thread_state_wipe.
    Cost in production: one relaxed atomic increment per wipe
    (~5 ns on x86_64), dominated by the explicit_bzero call that
    follows.

  ksuid_random_thread_state_set_sentinel_for_testing()
    Fills the calling thread's TLS RNG state with a 0xa5 sentinel
    pattern. Preserves the destructor_registered flag so a
    previously-installed thread-exit hook still fires on this
    thread.

  ksuid_random_thread_state_peek_for_testing(buf, n)
  ksuid_random_thread_state_size_for_testing()
    Copy the calling thread's TLS state bytes for inspection. The
    sentinel test uses these to confirm the 0xa5 pattern landed
    before exiting the thread.

Test (tests/test_rand_tls.c, KSUID_TESTING-gated):

  test_thread_exit_wipes_tls_state spawns a thread that:
    1. draws 1 random byte (triggers the seed path that registers
       the platform thread-exit destructor on this thread);
    2. overwrites the live TLS state with the 0xa5 sentinel;
    3. peeks to confirm at least 128 sentinel bytes are in place;
    4. exits.
  The main thread thrd_join's the worker, then asserts the global
  ksuid_thread_exit_wipes_observed counter ticked by exactly 1 --
  proving the platform-registered destructor fired during teardown.

The KSUID_TESTING flag is set per-test in tests/meson.build only
when meson detected a real thread-exit hook
(KSUID_HAVE_CXA_THREAD_ATEXIT_IMPL or KSUID_HAVE_FLS). On the
documented-residue lane the test is compiled out at preprocess time,
because there is no destructor to assert against.

The library itself sets KSUID_TESTING locally in rand_tls.c BEFORE
its rand.h include, which pulls in the for_testing prototypes for
the matching definitions. Without this the helpers would trigger
-Wmissing-prototypes; gating the prototypes from production callers
required gating the definition site too.

Verified locally on Linux glibc 2.43 / GCC 15.2.1:
  - 13/13 tests pass on the default build, including the new
    test_thread_exit_wipes_tls_state.
  - 13/13 tests pass on the KSUID_FORCE_VOLATILE_FALLBACK build.
  - clang-tidy 22 reports zero findings.
  - gst-indent leaves the working tree unchanged.
  - objdump shows 7 surviving wipe calls (>= 5 floor) plus 1 call
    to __cxa_thread_atexit_impl@plt, exactly the expected
    registration count for a single-translation-unit consumer.
@justinjoy justinjoy merged commit be4cde0 into main Apr 30, 2026
11 checks passed
@justinjoy justinjoy deleted the feature/issue-4-thread-exit-wipe branch April 30, 2026 07:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wipe per-thread CSPRNG state at thread exit (residue policy)

1 participant