Skip to content

wall profiler: track live PCPs via an intrusive list#341

Open
szegedi wants to merge 2 commits into
szegedi/thread-local-profilerfrom
szegedi/no-live-context-tracking
Open

wall profiler: track live PCPs via an intrusive list#341
szegedi wants to merge 2 commits into
szegedi/thread-local-profilerfrom
szegedi/no-live-context-tracking

Conversation

@szegedi
Copy link
Copy Markdown

@szegedi szegedi commented May 27, 2026

Summary

Replaces the unordered_set<PersistentContextPtr*> that the wall profiler uses to track live PersistentContextPtr (PCP) instances with an intrusive doubly-linked list threaded through the PCPs themselves. Same shutdown semantics — ~WallProfiler still deletes any PCP that V8 hasn't reclaimed yet — but with materially less per-PCP overhead.

Stacked on top of #322 (thread-local active profiler) and depends on the thread-local already being in place.

Why

liveContextPtrs_ exists because, at process exit, V8's CPED map can still hold PCPs that V8's late-shutdown finalizers haven't reached yet. LSAN's leak check fires before those finalizers, so without an explicit cleanup pass each surviving PCP gets reported as a leak (we verified this on node:22-bookworm with --experimental-async-context-frame: a baseline build with this PR's liveContextPtrs_ removed and no replacement leaks ~288 bytes across ~8 PCP allocations per run). The set is a fine tool for the cleanup job, but it carries hash-table overhead unrelated to that job.

What changed

  • New: each PersistentContextPtr carries a WallProfiler* const profiler_, a PersistentContextPtr** pprev_, and a PersistentContextPtr* next_. The two list pointers form a standard intrusive doubly-linked list rooted at WallProfiler::liveContextPtrHead_.
  • ~PersistentContextPtr unlinks itself when pprev_ != nullptr. That predicate also serves as proof that profiler_ is still valid, so the destructor dereferences it directly to decrement the live counter — no thread-local lookup, no ID compare.
  • ~WallProfiler walks the list, nulls each PCP's pprev_ (so the soon-to-fire destructor short-circuits the unlink instead of poking at the head pointer in dying memory), then deletes each PCP. ~node::ObjectWrap clears the V8 weak callback during that delete, so the dangling internal-field pointer left behind in the wrap object stays inert even if V8 GCs the wrap later.
  • GetMetrics reads a std::atomic<size_t> liveContextPtrCount_ updated alongside the list link/unlink, replacing the old liveContextPtrs_.size(). usedAsyncContextCount / totalAsyncContextCount keep their existing semantics (both report the live count, just like before).

Memory savings

Per live PCP, the tracking cost goes from:

Before (unordered_set) After (intrusive list)
Inside the PCP object 8 B (back-pointer to set) 8 B (back-pointer) + 16 B (two list pointers) = 24 B
Outside the PCP object ~40 B (hash node + bucket slot in the set) 0 B
Total ~48 B/PCP 24 B/PCP

A net saving of ~24 bytes per live async context. The set's per-bucket fixed overhead also goes away, though that's negligible compared to the per-element savings on real workloads.

The list operations are O(1) (insert-at-head, unlink-by-pointer), matching the previous set's amortized characteristics.

Verification

Ran the project's Linux ASAN+LSAN suite (npm run test:js-asan) in a node:22-bookworm container with --node-option=experimental-async-context-frame so the CPED-mode tests actually execute (without that flag the tests gate themselves out and the code path under test never runs in CI). Compared baseline (this PR reverted) vs patched:

Tests LSAN summaries
Baseline 95 passing / 1 pending 40528 B / 75936 B leaked — all V8/Node internal
Patched 95 passing / 1 pending 40528 B / 79648 B leaked — all V8/Node internal

Zero references to PersistentContextPtr, WallProfiler, dd_pprof.node, or wall.cc in the patched LSAN output. The remaining leaks are pure V8/Node-internal cleanup-on-exit noise, present in identical shape on the baseline (small magnitude differences are noise — V8's final GC state at exit is non-deterministic).

Test plan

  • npm run build (macOS Release): clean
  • npm run test:js (macOS): 96 passing
  • npm run test:js-asan (Linux container, Node 22 + --experimental-async-context-frame): 95 passing, 1 pending, no PCP/WallProfiler/dd_pprof leak reports
  • CI passes

Jira: PROF-14791

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 27, 2026

Overall package size

Self size: 2.08 MB
Deduped: 2.44 MB
No deduping: 2.44 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | node-gyp-build | 4.8.4 | 13.86 kB | 13.86 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@szegedi szegedi force-pushed the szegedi/no-live-context-tracking branch from 4c8ad2f to 05cfbdf Compare May 28, 2026 11:41
@datadog-prod-us1-3

This comment has been minimized.

PersistentContextPtr instances were registered in an unordered_set on
the WallProfiler so that ~WallProfiler could delete any PCPs that V8
had not GC'd yet — otherwise they show up as leaks under LSAN at
process exit, since LSAN's check runs before V8's late-shutdown
finalizers. The hash-set was heavier than that single use case
deserved: every live PCP cost a back-pointer into the set (8 bytes)
plus a separate hash node + bucket overhead (~40 bytes) external to
the object.

Replace it with an intrusive doubly-linked list threaded through the
PCPs themselves. Each PCP carries:

  WallProfiler* const profiler_      // 8 bytes
  PersistentContextPtr** pprev_      // 8 bytes
  PersistentContextPtr*  next_       // 8 bytes

~PCP unlinks itself when pprev_ != nullptr — the "still on a live
profiler's list" predicate. ~WallProfiler walks the list and nulls
each pprev_ before delete, so the destructor that fires as part of
the walk becomes a no-op rather than poking at the dying profiler's
head pointer. Outside of teardown, pprev_ != nullptr is also the
proof that the back-pointer is still valid, so ~PCP can dereference
profiler_ to decrement its counter without any thread-local or ID
lookup.

PCP-tracking memory drops from ~48 bytes/PCP (8-byte back-pointer
into set + ~40 bytes of hash node / bucket) to 24 bytes/PCP (back-
pointer + two list pointers), a saving of ~24 bytes per live async
context.

Replace GetMetrics's liveContextPtrs_.size() read with an atomic
counter incremented/decremented alongside the list link/unlink so the
metric path doesn't depend on the container. usedAsyncContextCount /
totalAsyncContextCount keep their existing semantics (both report the
live count, as before).
@szegedi szegedi force-pushed the szegedi/no-live-context-tracking branch from 05cfbdf to 140607d Compare May 28, 2026 11:55
Alpine/musl rejects initial-exec TLS relocations from dlopen'd libraries at
load time, so the previous "thread_local + initial-exec" approach made the
addon unloadable on musl. Dropping initial-exec is not an option either —
on glibc that brings back the __tls_get_addr call from the SIGPROF handler.

pthread_getspecific is explicitly on POSIX's async-signal-safe list, works
identically on glibc, musl, and macOS, and avoids the dlopen-vs-TLS-model
mess entirely. The key is allocated once via a function-local static
initializer, before any signal handler can fire. On Windows SIGPROF isn't
compiled in, so the signal-safety constraint doesn't apply there and a
plain thread_local is fine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant