wall profiler: track live PCPs via an intrusive list#341
Open
szegedi wants to merge 2 commits into
Open
Conversation
Overall package sizeSelf size: 2.08 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | node-gyp-build | 4.8.4 | 13.86 kB | 13.86 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
4c8ad2f to
05cfbdf
Compare
This comment has been minimized.
This comment has been minimized.
PersistentContextPtr instances were registered in an unordered_set on the WallProfiler so that ~WallProfiler could delete any PCPs that V8 had not GC'd yet — otherwise they show up as leaks under LSAN at process exit, since LSAN's check runs before V8's late-shutdown finalizers. The hash-set was heavier than that single use case deserved: every live PCP cost a back-pointer into the set (8 bytes) plus a separate hash node + bucket overhead (~40 bytes) external to the object. Replace it with an intrusive doubly-linked list threaded through the PCPs themselves. Each PCP carries: WallProfiler* const profiler_ // 8 bytes PersistentContextPtr** pprev_ // 8 bytes PersistentContextPtr* next_ // 8 bytes ~PCP unlinks itself when pprev_ != nullptr — the "still on a live profiler's list" predicate. ~WallProfiler walks the list and nulls each pprev_ before delete, so the destructor that fires as part of the walk becomes a no-op rather than poking at the dying profiler's head pointer. Outside of teardown, pprev_ != nullptr is also the proof that the back-pointer is still valid, so ~PCP can dereference profiler_ to decrement its counter without any thread-local or ID lookup. PCP-tracking memory drops from ~48 bytes/PCP (8-byte back-pointer into set + ~40 bytes of hash node / bucket) to 24 bytes/PCP (back- pointer + two list pointers), a saving of ~24 bytes per live async context. Replace GetMetrics's liveContextPtrs_.size() read with an atomic counter incremented/decremented alongside the list link/unlink so the metric path doesn't depend on the container. usedAsyncContextCount / totalAsyncContextCount keep their existing semantics (both report the live count, as before).
05cfbdf to
140607d
Compare
Alpine/musl rejects initial-exec TLS relocations from dlopen'd libraries at load time, so the previous "thread_local + initial-exec" approach made the addon unloadable on musl. Dropping initial-exec is not an option either — on glibc that brings back the __tls_get_addr call from the SIGPROF handler. pthread_getspecific is explicitly on POSIX's async-signal-safe list, works identically on glibc, musl, and macOS, and avoids the dlopen-vs-TLS-model mess entirely. The key is allocated once via a function-local static initializer, before any signal handler can fire. On Windows SIGPROF isn't compiled in, so the signal-safety constraint doesn't apply there and a plain thread_local is fine.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the
unordered_set<PersistentContextPtr*>that the wall profiler uses to track livePersistentContextPtr(PCP) instances with an intrusive doubly-linked list threaded through the PCPs themselves. Same shutdown semantics —~WallProfilerstill deletes any PCP that V8 hasn't reclaimed yet — but with materially less per-PCP overhead.Stacked on top of #322 (thread-local active profiler) and depends on the thread-local already being in place.
Why
liveContextPtrs_exists because, at process exit, V8's CPED map can still hold PCPs that V8's late-shutdown finalizers haven't reached yet. LSAN's leak check fires before those finalizers, so without an explicit cleanup pass each surviving PCP gets reported as a leak (we verified this onnode:22-bookwormwith--experimental-async-context-frame: a baseline build with this PR'sliveContextPtrs_removed and no replacement leaks ~288 bytes across ~8 PCP allocations per run). The set is a fine tool for the cleanup job, but it carries hash-table overhead unrelated to that job.What changed
PersistentContextPtrcarries aWallProfiler* const profiler_, aPersistentContextPtr** pprev_, and aPersistentContextPtr* next_. The two list pointers form a standard intrusive doubly-linked list rooted atWallProfiler::liveContextPtrHead_.~PersistentContextPtrunlinks itself whenpprev_ != nullptr. That predicate also serves as proof thatprofiler_is still valid, so the destructor dereferences it directly to decrement the live counter — no thread-local lookup, no ID compare.~WallProfilerwalks the list, nulls each PCP'spprev_(so the soon-to-fire destructor short-circuits the unlink instead of poking at the head pointer in dying memory), thendeletes each PCP.~node::ObjectWrapclears the V8 weak callback during that delete, so the dangling internal-field pointer left behind in the wrap object stays inert even if V8 GCs the wrap later.GetMetricsreads astd::atomic<size_t> liveContextPtrCount_updated alongside the list link/unlink, replacing the oldliveContextPtrs_.size().usedAsyncContextCount/totalAsyncContextCountkeep their existing semantics (both report the live count, just like before).Memory savings
Per live PCP, the tracking cost goes from:
unordered_set)A net saving of ~24 bytes per live async context. The set's per-bucket fixed overhead also goes away, though that's negligible compared to the per-element savings on real workloads.
The list operations are O(1) (insert-at-head, unlink-by-pointer), matching the previous set's amortized characteristics.
Verification
Ran the project's Linux ASAN+LSAN suite (
npm run test:js-asan) in anode:22-bookwormcontainer with--node-option=experimental-async-context-frameso the CPED-mode tests actually execute (without that flag the tests gate themselves out and the code path under test never runs in CI). Compared baseline (this PR reverted) vs patched:Zero references to
PersistentContextPtr,WallProfiler,dd_pprof.node, orwall.ccin the patched LSAN output. The remaining leaks are pure V8/Node-internal cleanup-on-exit noise, present in identical shape on the baseline (small magnitude differences are noise — V8's final GC state at exit is non-deterministic).Test plan
npm run build(macOS Release): cleannpm run test:js(macOS): 96 passingnpm run test:js-asan(Linux container, Node 22 +--experimental-async-context-frame): 95 passing, 1 pending, no PCP/WallProfiler/dd_pprof leak reportsJira: PROF-14791