This repository was archived by the owner on May 1, 2026. It is now read-only.
core: wipe per-thread CSPRNG state at thread exit#11
Merged
Conversation
Closes #4 (commit 1 of 3 in the series). Lands the wipe entry point + the public-header residue policy that documents what callers can rely on. Commit 2 adds the per-platform automatic registration (__cxa_thread_atexit_impl on glibc / libc++abi / MUSL >= 1.2.0, FlsAlloc on Windows). Commit 3 adds a runtime regression test that drives the hook and asserts the TLS state was zeroed. Surface added (private): libksuid/rand.h - ksuid_random_thread_state_wipe() declared next to the existing ksuid_random_force_reseed test hook. libksuid/rand_tls.c - ksuid_random_thread_state_wipe() implemented: wipes the entire ksuid_tls_rng_t via ksuid_explicit_bzero, including the seeded flag so the next draw goes through the full reseed path. - _Thread_local bool ksuid_tls_in_destructor_ re-entry guard (Critic R8): if the wipe ever calls back into ksuid_random_bytes -- e.g. a future debug log line in this file -- the guarded ksuid_random_bytes returns -1 instead of reseeding into a slot that is mid-teardown. The guard is held only across the ksuid_explicit_bzero call, which is tiny and cannot itself re-enter. - The TODO(#4) banner from issue #2 commit 2 is replaced with a "Thread-exit residue policy" comment block that summarises the situation across platforms and points at the commit 2 registration. Surface added (public docs only, no ABI delta): libksuid/ksuid.h - "Thread-exit residue" paragraph above ksuid_new / ksuid_set_rand stating the contract: glibc / libc++abi / MUSL>=1.2.0 / Windows FLS get automatic wipe at thread exit; other platforms rely on the bounded reseed cadence (1 MiB / 1 hour / fork) to keep the residue window small. CI gate update: .github/workflows/ci-pr.yml - The auto-build disasm grep gate's floor moves from 4 to 5. The five surviving wipe call sites are: kn-on-RNG-failure, kn-after- seed-copy, consumed-keystream-in-loop (all rand_tls.c), x[16] in chacha20.c, and the new ksuid_random_thread_state_wipe -> ksuid_explicit_bzero call. Observed locally on glibc 2.43 / GCC 15.2.1: 7 surviving calls (the static-inline shim is partially inlined, partially kept out-of-line, so each source-level call shows up as one or both forms). Verified locally: - 13/13 tests pass - clang-tidy 22 reports zero findings - gst-indent leaves the working tree unchanged - Auto-build: 7 surviving wipe calls (>= 5 floor, GATE PASS) - Both KSUID_FORCE_VOLATILE_FALLBACK and default builds still pass test_wipe. The wipe is reachable only through the test harness in this commit; commit 2 wires the platform-specific automatic registration.
…s FLS Closes #4 (commit 2 of 3 in the series). Wires the wipe entry point that landed in commit 1 to fire automatically when the owning thread exits. Two platform paths, gated by meson cc.links() / host_machine.system() probes: 1. KSUID_HAVE_CXA_THREAD_ATEXIT_IMPL glibc 2.18+, MUSL 1.2.0+, libc++abi (macOS, FreeBSD modern). Uses __cxa_thread_atexit_impl(callback, arg, dso_handle) with &__dso_handle as the third argument so a dlclose(libksuid.so) cleanly tears down its registrations. Probed via cc.links() against an explicit prototype (the symbol has no public declaration, so cc.has_function would lie either way). 2. KSUID_HAVE_FLS (Windows / Cygwin) FlsAlloc + InitOnceExecuteOnce. Slot value is a non-NULL "this thread participated" sentinel; the real state stays in _Thread_local storage and is reachable from the same thread during FLS teardown (before the runtime tears down TLS). Callback uses NTAPI calling convention (Critic R4) to avoid stack corruption on x86_32 MSVC. 3. else (uClibc, bionic, MUSL < 1.2.0, ...) Falls through to the documented residue policy from commit 1. ksuid_random_force_reseed remains the manual mitigation callers must use before joining a worker thread. Critic risk register addressed: R1 cc.links() probe with explicit prototype, not cc.has_function -- the symbol is private so we cannot trust header-based detection. R2 ksuid_tls_register_thread_exit() is called from inside ksuid_tls_rng_seed BEFORE r->seeded = true, so a thread that exits between seed and registration cannot leave a half-wired state. The destructor itself is idempotent and bails on unseeded state because ksuid_explicit_bzero on already-zeroed memory is a no-op. R3 dso_handle: extern void *__dso_handle is declared inside rand_tls.c and passed by address. R4 Windows FLS callback uses VOID NTAPI signature. R5 MUSL < 1.2.0 / bionic / uClibc: meson summary line emits "thread-exit wipe: documented residue (no automatic wipe)" so CI logs make the platform's lifecycle behaviour auditable. R7 FlsAlloc, NOT DllMain -- works for both static and DLL link modes because the FLS slot is owned by the process. R8 Re-entry guard already lives on the wipe entry point itself (commit 1). Surface added: none (purely internal). meson.build gains a second summary line ("thread-exit wipe: ...") and one cc.links() probe. Verified locally on Linux glibc 2.43 / GCC 15.2.1: - meson summary reports "thread-exit wipe: __cxa_thread_atexit_impl" - 13/13 tests pass - clang-tidy 22 reports zero findings - gst-indent leaves the working tree unchanged - objdump shows 1 call to __cxa_thread_atexit_impl@plt and 7 calls to {explicit_bzero, ksuid_explicit_bzero} (>= 5 floor) - KSUID_FORCE_VOLATILE_FALLBACK build still passes test_wipe. Commit 3 adds a runtime regression test that drives the wipe via a KSUID_TESTING-gated test hook and asserts the state was zeroed.
Closes #4 (commit 3 of 3 in the series). Adds a runtime regression test that drives the thread-exit wipe path deterministically via three KSUID_TESTING-gated test hooks and an always-defined atomic counter. The library always defines the counter and the helpers; rand.h gates the prototypes behind KSUID_TESTING so production callers cannot reach them. Hooks added (libksuid/rand.h, KSUID_TESTING-gated extern): ksuid_thread_exit_wipes_observed _Atomic int incremented inside ksuid_random_thread_state_wipe. Cost in production: one relaxed atomic increment per wipe (~5 ns on x86_64), dominated by the explicit_bzero call that follows. ksuid_random_thread_state_set_sentinel_for_testing() Fills the calling thread's TLS RNG state with a 0xa5 sentinel pattern. Preserves the destructor_registered flag so a previously-installed thread-exit hook still fires on this thread. ksuid_random_thread_state_peek_for_testing(buf, n) ksuid_random_thread_state_size_for_testing() Copy the calling thread's TLS state bytes for inspection. The sentinel test uses these to confirm the 0xa5 pattern landed before exiting the thread. Test (tests/test_rand_tls.c, KSUID_TESTING-gated): test_thread_exit_wipes_tls_state spawns a thread that: 1. draws 1 random byte (triggers the seed path that registers the platform thread-exit destructor on this thread); 2. overwrites the live TLS state with the 0xa5 sentinel; 3. peeks to confirm at least 128 sentinel bytes are in place; 4. exits. The main thread thrd_join's the worker, then asserts the global ksuid_thread_exit_wipes_observed counter ticked by exactly 1 -- proving the platform-registered destructor fired during teardown. The KSUID_TESTING flag is set per-test in tests/meson.build only when meson detected a real thread-exit hook (KSUID_HAVE_CXA_THREAD_ATEXIT_IMPL or KSUID_HAVE_FLS). On the documented-residue lane the test is compiled out at preprocess time, because there is no destructor to assert against. The library itself sets KSUID_TESTING locally in rand_tls.c BEFORE its rand.h include, which pulls in the for_testing prototypes for the matching definitions. Without this the helpers would trigger -Wmissing-prototypes; gating the prototypes from production callers required gating the definition site too. Verified locally on Linux glibc 2.43 / GCC 15.2.1: - 13/13 tests pass on the default build, including the new test_thread_exit_wipes_tls_state. - 13/13 tests pass on the KSUID_FORCE_VOLATILE_FALLBACK build. - clang-tidy 22 reports zero findings. - gst-indent leaves the working tree unchanged. - objdump shows 7 surviving wipe calls (>= 5 floor) plus 1 call to __cxa_thread_atexit_impl@plt, exactly the expected registration count for a single-translation-unit consumer.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #4.
Summary
The per-thread CSPRNG state in
libksuid/rand_tls.clives in_Thread_localstorage. When a thread exits, the OS reclaims the TLS block eventually, but until then the 64-byte ChaCha20 state and the 64-byte keystream window remain in process memory containing live cryptographic material.This PR adds an automatic thread-exit wipe on platforms with a thread-exit hook (glibc 2.18+ via
__cxa_thread_atexit_impl, MUSL >= 1.2.0, libc++abi on macOS, FLS on Windows) and documents the residue policy in the public header for platforms without one (Alpine MUSL <1.2.0, bionic, uClibc).Series — three atomic commits
b199d47feat:ksuid_random_thread_state_wipeinternal hook + re-entry guard + public residue-policy doc + DSE gate floor 4 → 5dd8e24acore:__cxa_thread_atexit_implon POSIX,FlsAllocon Windows) + mesoncc.links()probe +summary()linebf191b1test:Pipeline that ran
Per the global GitHub-issue resolution workflow rule:
__cxa_thread_atexit_impl+ Windows FLS as the two automatic-wipe paths, documented residue elsewhere, sentinel-byte test design with KSUID_TESTING-gated hooks.dso_handle(R3), Windows FLS NTAPI (R4), MUSL <1.2.0 silent residue (R5), macOS detection (R6), DllMain trap (R7), re-entrancy (R8), DSE-resistant test (R9), CI matrix gaps (R10).What gates this PR
gst-indent+clang-tidy 22(lint phase)meson distround-trip on Ubuntuwipe-fallbackjob (KSUID_FORCE_VOLATILE_FALLBACK=1build, asserts noexplicit_bzero@pltleak)ksuid_random_thread_state_wipe)Side benefits
test_thread_exit_wipes_tls_stateexercises the production wipe call path (rand_tls.c'sksuid_random_thread_state_wipe→ksuid_explicit_bzero) under the destructor fire, which the issue Wipe CSPRNG state with explicit_bzero / SecureZeroMemory shim (defeat DSE) #2 follow-up note flagged as untested.Test plan
__cxa_thread_atexitresolution)meson summaryreports bothwipe backendANDthread-exit wipelinesmeson distround-trip greenwipe-fallbackjob still green (the new destructor wipe routes throughksuid_explicit_bzerolike every other site, so the no-explicit_bzero@pltinvariant holds)Out of scope (follow-ups)
__cxa_thread_atexit_implfrom libc++abi without an explicit-lc++abilink line. If CI fails there, the fix would be to addcc.find_library('c++abi')inmeson.build.