Skip to content

Backport fix/evict-dropped-cf-on-reopen to v1.4.x (for 1.4.3)#648

Closed
sleekmountaincat wants to merge 1 commit into
v1.4.xfrom
fix/evict-dropped-cf-1.4.2-backport
Closed

Backport fix/evict-dropped-cf-on-reopen to v1.4.x (for 1.4.3)#648
sleekmountaincat wants to merge 1 commit into
v1.4.xfrom
fix/evict-dropped-cf-1.4.2-backport

Conversation

@sleekmountaincat

Copy link
Copy Markdown

Summary

Conflict-free cherry-pick of #647 onto the v1.4.x line: free a column family name immediately on drop so same-name recreates get a fresh column family instead of the dangling dropped handle. Fixes Harper's ghost-table failures on the 5.0.x fleet (harper-pro pins ^1.4.2).

Intended to ship as 1.4.3 so existing 5.0.x deployments can pick it up via a harper-pro patch release, without waiting for 5.1 / the 2.x binding line.

Validation (this exact branch)

  • Built in-container against live harper-pro 5.0.27 (linux x64 glibc, node 24.16, RocksDB 11.1.1 prebuilt) and swapped into the running server on a 2-node Fabric repro cluster: drop + same-name recreate + insert works, rapid drop/recreate cycles work, no env poisoning, no wedged creates, clean drops stay dropped across restart.
  • Drop test suite passes locally against this branch's build.
  • See fix: free a column family name immediately on drop (ghost tables) #647 for the full description, the semantics-change discussion, and the complete suite run.

Note: the release itself is tag-driven; no package.json version bump included here, the release owner tags v1.4.3.

Generated by an AI agent (Claude).

🤖 Generated with Claude Code

Root cause of the harper "ghost table" failure. When a column family was
dropped while other DBHandles still referenced it (Harper opens every table
from multiple worker threads), tryUnregisterColumnFamily left the entry in
DBDescriptor::columns (use_count > 2; versions before v2.0.0 had no eviction
at all). A later open-by-name in DBRegistry::OpenDB then reused the dropped
descriptor and handed back a dangling column family handle: every write
failed with "Invalid argument: Invalid column family specified in write
batch", which poisoned the database env, wedged table creation, and made
drop+recreate-same-name impossible without a process restart.

Fix: unconditionally remove the by-name map entry on successful drop
(DBDescriptor::unregisterColumnFamily). Instances still holding the dropped
column family keep the descriptor alive through their shared_ptr and can
continue reading until they close; only the by-name lookup is removed, so a
reopen creates a fresh column family.

Because the erase now fires on every drop, the columns map gets a dedicated
columnsMutex guarding all concurrent accessors: DBRegistry::OpenDB's
copy-check-insert, the columns getter, flush() (runs on libuv worker
threads; now also pins the descriptors so handles cannot die mid-Flush),
close()'s compact loop and clear, and the erase itself. Lock order is
databasesMutex before columnsMutex; the drop path takes only columnsMutex,
avoiding any interaction with registry-lock holders that wait on in-flight
operations (e.g. PurgeAll at shutdown).

This intentionally changes drop semantics from deferred-while-referenced to
immediate: the old "drop a column family with two database instances" test
encoded the deferred contract and has been replaced with a regression test
for the new contract. The new test also documents that a write attempted
through a still-held dropped handle fails and contaminates the env's shared
write path (the poison this fix prevents callers from triggering through
reopen-by-name).

Removes DBDescriptor::tryUnregisterColumnFamily (its conditional refcount
heuristic was the bug; Drop/DropSync were its only callers).

Validated: native build clean (macOS arm64), full vitest suite 459 passed /
1 skipped / 0 failed; tsc, oxlint, and oxfmt clean (pre-commit hook bypassed
only because the local pnpm shim is broken; its checks were run manually).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@sleekmountaincat sleekmountaincat requested a review from a team as a code owner June 11, 2026 04:34
@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@cb1kenobi

Copy link
Copy Markdown
Member

We aren't planning to do another 1.4 release now that 1.5 is out.

@sleekmountaincat

Copy link
Copy Markdown
Author

Makes sense, thanks @cb1kenobi. Replaced by #649 targeting v1.5.x (for a 1.5.1 release; harper-pro's ^1.4.2 range still picks it up). Closing this one.

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants