Skip to content

fix(schema): atomic table drops, interrupted-drop completion at startup, and create-lock release (ghost tables)#1246

Draft
sleekmountaincat wants to merge 5 commits into
mainfrom
fix/ghost-table-drop-atomicity
Draft

fix(schema): atomic table drops, interrupted-drop completion at startup, and create-lock release (ghost tables)#1246
sleekmountaincat wants to merge 5 commits into
mainfrom
fix/ghost-table-drop-atomicity

Conversation

@sleekmountaincat

Copy link
Copy Markdown

Summary

Defense-in-depth for the ghost-table class of failures (5 commits):

  1. Table.dropTable() awaits the column family drops and orders them before the catalog removal; a failed drop now surfaces as the operation's error instead of a swallowed rejection that reports successfully deleted while orphaning column families on disk.
  2. A dropping tombstone is persisted on the table's primary catalog entry before any destructive work. Both the boot-time schema load (completeInterruptedDrop in databases.ts) and a same-name table() create complete an interrupted drop instead of resurrecting the table (the customer-reported "deleted table came back after restart").
  3. The create path's exclusive update-attributes spin lock is released on throw. Previously a failure inside the create window leaked it, and the next create spun a worker at 100% CPU forever (confirmed live: 1% to 105% container CPU, pinned http thread, ops API unresponsive).
  4. Regression suite unitTests/resources/dropTableGhost.test.js + DESIGN.md notes.

Purpose

Root-caused and reproduced from scratch on a live 2-node Fabric cluster (harper-pro 5.0.27): drop + same-name recreate without a restart reliably poisons the database env. The primary fix is in the binding (HarperFast/rocksdb-js#647 / #648, the stale dropped-handle reuse); this PR is the core-side hardening that makes drop failures honest, completes interrupted drops instead of resurrecting them, and prevents the wedge, including for failure causes the binding fix cannot see (genuinely corrupt or half-initialized column families).

DRAFT - landing gate

The new regression tests REQUIRE a binding release containing the eviction fix: on stock rocksdb-js 1.4.2 three of the four fail with the incident error (Could not access column family N), by design. Landing order: rocksdb-js#647 merges, 1.4.3 is released from #648, then this PR gets a lockfile bump to 1.4.3 and converts from draft.

Where to focus review

  1. dropTable ordering (resources/Table.ts): tombstone (awaited on LMDB where put returns a promise) -> in-memory delete (concurrent requests get "table does not exist" rather than racing the drops) -> blob cleanup -> awaited CF drops -> catalog removal. A failure after the in-memory delete leaves the table invisible but tombstoned; the drop completes durably at next startup or same-name create. Judgment call worth scrutiny: invisible-but-tombstoned vs the old behavior (visible-in-catalog but describe lies).
  2. completeInterruptedDrop (resources/databases.ts): runs during boot schema load and from the create path under the exclusive lock. Per-store error tolerance for concurrent workers reconciling the same table. The LMDB branch drops the named sub-databases too; LMDB reuses DBIs by name, so catalog-only cleanup would let a same-name recreate silently inherit the old table's records (caught in cross-model review).
  3. Create-over-tombstone (resources/databases.ts create path): a tombstoned catalog entry is truthy in the existing getSync check; without the new handling the create recursed forever (also caught in review; regression test has a timeout).
  4. Partial-failure window that remains: if index CF drops succeed and the primary drop fails, some column families are gone while the catalog survives (tombstoned). The reconcile completes the removal at next start; there is still no cross-CF transaction.

Validation

  • unitTests/resources/dropTableGhost.test.js: 4 regression tests; all pass with the fixed binding, 3 fail on stock 1.4.2 (differential proof the binding fix + this hardening cover the incident).
  • Resources suite: 691 passing; the only failure ("Can run txn with three tables and two databases") fails identically on the base commit (pre-existing, plus a pre-existing flaky Replication full-copy OOM: per-record audit-chain walk in Table.commit scales with audit-log depth #1114 pair). dataLayer: 155 passing.
  • prettier, oxlint, tsc clean for the touched files.
  • The binding-side fix was e2e-validated against live harper-pro 5.0.27 on the repro cluster (see rocksdb-js#647 for details).

Generated by an AI agent (Claude, investigation + fix); cross-model (Antigravity) review findings are addressed in the third commit.

🤖 Generated with Claude Code

chris nelson and others added 5 commits June 5, 2026 15:57
…exclusive lock

Defense-in-depth for the "ghost table" failure. Two independent core bugs:

1. Table.dropTable() removed the catalog metadata (synchronous dbisDb.remove =
   removeSync) BEFORE primaryStore.drop()/index.drop(), and the column-family
   drops were fire-and-forget (promises unawaited, rejections swallowed). A
   corrupt/half-initialized CF made the drop fail while the catalog was already
   cleared - an orphaned ghost CF, with the op still reporting success. Now the
   CF drops run and are awaited FIRST; the catalog metadata and in-memory entry
   are only removed after they succeed. A failed drop now throws and leaves the
   table fully intact (catalog + CF) instead of half-deleted.

2. databases.ts table() acquired the 'update-attributes' exclusive spin lock
   (while (!rootStore.tryLock(...)) {}) and then ran openRocksDatabase + the
   tableId/catalog puts OUTSIDE any try/finally. A throw there (e.g. in an env
   poisoned by a prior dangling CF) leaked the lock, so every subsequent
   create_table on that database spun forever - a hard wedge pinning a worker
   at 100% CPU. The create body is now wrapped so the lock is released on throw.

NOT fully sufficient on its own: in-process same-name drop+recreate still needs
the @harperfast/rocksdb-js fix (evict the dropped CF handle so reopen-by-name
creates fresh). These changes make the failure SAFE (no orphan, no wedge, no
swallowed error) and handle the genuinely-corrupt-CF case.

NOT YET BUILT/TESTED: fresh worktree without node_modules; run build (TypeStrip
dist), format:write, lint:required, and add unit tests before PR. Boot-time
catalog-vs-column-family reconcile (to stop restart resurrection) is a proposed
follow-up, not included here.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…cting tables

dropTable now persists a 'dropping' tombstone on the table's primary catalog
entry before any destructive work. If the process dies or a column family
drop fails partway, the next startup sees the tombstone, drops any surviving
column families, removes the catalog rows, and skips loading the table.
Previously the surviving catalog rows were blindly re-opened with
create-if-missing, silently resurrecting dropped tables after a restart
(observed on the repro cluster, and matches a customer report of a deleted
table reappearing the next day).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Cross-model review (Antigravity) found three issues in the tombstone work:

1. Same-process create over a tombstoned catalog entry recursed forever (the
   entry is truthy in the create path's getSync check). The create path now
   detects the dropping flag and completes the interrupted drop inline, under
   the exclusive lock, before creating fresh.
2. The LMDB path only removed catalog rows; LMDB reuses named sub-databases
   on open, so a same-name recreate silently inherited the old table's
   records. completeInterruptedDrop now drops the LMDB stores too.
3. Deferring the in-memory delete to the end of dropTable let concurrent
   requests race the awaited column family drops. The in-memory delete is
   back at the start (right after the tombstone write): a failed drop leaves
   the table invisible but tombstoned, and the drop completes durably on the
   next startup or same-name create.

Also awaits the tombstone write on LMDB (put returns a promise there) and
extracts completeInterruptedDrop as a shared helper with per-store error
tolerance for concurrent workers reconciling the same table.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rrupted drops

Four regression tests for the ghost-table class of failures:
1. drop + same-name recreate in-process (data must not leak from the old store)
2. interrupted drop (tombstone present) completes at load instead of resurrecting
3. create over a tombstoned entry completes the drop and creates fresh
   (regression for the unbounded recursion the review caught)
4. a failed column family drop surfaces an error, leaves a tombstone, and the
   next same-name create completes the drop

NOTE: tests 1, 3, and 4 fail against stock @harperfast/rocksdb-js 1.4.2 with
'Could not access column family N' (the incident error) and require the
column-family eviction fix (rocksdb-js fix/evict-dropped-cf-on-reopen). The
dependency must be bumped to a release containing that fix before this lands
in CI. Verified locally: all 4 pass with the fixed binding; resources suite
691 passing with the only failure being the pre-existing multi-database
transaction test (fails identically on the base commit).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… lock

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant