fix(EN-1443): purge same-batch reverse-map orphans on metadata field removal#1485
Conversation
…removal handleRemovedMetadataFieldType purged the reverse-map (0x03) limb of a removed indexed metadata field via a committed-only Pebble snapshot scan. A reverse-map PUT written earlier in the same uncommitted builder batch (e.g. a SavedMetadata on the indexed field) was invisible to that snapshot, so no delete was emitted and the stale row committed at Flush() as an orphaned reverse-map row for a field that no longer exists — a read-index projection diverging from a from-scratch replay (CLAUDE.md #8). Unlike the forward/entity-exists limbs, whose range tombstone gets a higher sequence number than earlier same-batch SETs and so shadows uncommitted rows, the reverse map cannot be range-deleted and relied on the snapshot alone. purgeReverseMapForKey now also consults the batch's read-your-writes rmapOverlay — the mechanism the codebase already maintains for exactly this — deleting same-batch PUTs for the removed key. Adds WriteBatch.DeleteReverseMapKey (delete + keep the overlay honest) and RangeReverseMapOverlay. Corrects the two self-contradictory doc comments. Regression test covers committed-then-removed, same-batch (the repro), mixed, and unrelated-key-survives; the three same-batch cases fail before this change. https://formance-team.atlassian.net/browse/EN-1443
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release/v3.0 #1485 +/- ##
================================================
+ Coverage 73.22% 73.96% +0.74%
================================================
Files 388 390 +2
Lines 39765 39776 +11
================================================
+ Hits 29117 29420 +303
+ Misses 7900 7796 -104
+ Partials 2748 2560 -188
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
paul-nicolas
left a comment
There was a problem hiding this comment.
Multi-model review — PR #1485 (EN-1443 reverse-map orphan purge)
Models: Claude (this repo's perspective) + Codex (gpt-5.4, external). Gemini not used.
Verdict: the fix is correct and sound — no blocking issues. I verified independently: go build + go vet on both changed packages are clean; the new TestHandleRemovedMetadataFieldType_PurgesReverseMap passes on the PR head, and the three same-batch subtests genuinely fail on the base branch (orphan row survives, count=1) — the test reproduces the bug it claims to. The overlay stays consistent with the batch (all reverse-map writers — ReplaceMetadataIndexV, DeleteMetadataEntryWithPreviousV, and the new DeleteReverseMapKey — mirror into rmapOverlay). The Go-map iteration order in RangeReverseMapOverlay is a non-issue: the read-side index is a per-replica projection off the FSM hot path, and the purge is an order-independent delete-set, so the committed result is identical regardless of iteration order.
Severity counts: 0 Critical, 0 Major, 2 Medium (posted inline), 4 Minor/Nit (listed below, not posted inline).
Conflicts reconciled between the two reviewers:
- Checker-coverage gap: Codex rated Major, Claude rated Medium → resolved to Medium. It is real per invariant #8 but pre-existing, explicitly deferred in this PR's own description, and documented as a separately-tracked effort at
internal/application/check/checker.go:388-397. This PR neither introduces nor worsens it. - O(M·N) overlay scan: Codex rated Medium, Claude rated Minor → resolved to Minor. M (fields removed per batch) is typically 1, the committed-scan half was already O(M·rows) pre-PR, and the overlay half is bounded by batch write count.
- Transaction-namespace test gap: Codex rated Minor, Claude rated Medium → resolved to Medium (both flagged it; distinct key-layout parsing path, trivially addable).
Minor / Nit (not posted inline)
internal/application/indexbuilder/process_metadata_field_removal.go:182— [Minor] Per-call full-overlay scan is O(M·N) when many fields are removed in one batch; acceptable as-is (matches the documented tradeoff ingc.go), revisit only if batched multi-field removals become common.internal/application/indexbuilder/process_metadata_field_removal.go:141— [Nit] Theseendedup set is effectively dead code: the committed pass'sDeleteReverseMapKeysetsrmapOverlay[k]=nilbefore the overlay range runs, so any shared key hits thevalue==nilguard and never re-reachesdeleteMatch. Harmless, but the comment ("deleted only once") overstates its necessity — either drop it or reword as purely defensive.internal/storage/readstore/write_batch.go:19— [Minor] The struct comment's claim that an un-mirrored reverse-map write is "unrepresentable" is overstated:internal/application/indexbuilder/gc.go:84deletes reverse-map rows via rawbatch.DeleteKey, bypassing the overlay (safe only because it runs boot-only). Scope the guarantee to the live write helpers.internal/application/indexbuilder/process_metadata_field_removal_test.go:125— [Minor] Two overlay branches are unexercised: thevalue==nilpath (a same-batch write-then-delete-then-remove) and a true committed+overlay dedup collision (the "mixed" subtest uses distinct entities acct-1/acct-2, so no key is ever in both committed state and a non-nil overlay entry).
…verse-map purge Add subtests exercising the transaction-namespace reverse-map purge path (distinct 8-byte txID + version key layout) and the same-batch overlay branches (value==nil already-deleted skip, committed+in-flight rewrite of the same key). The transaction same-batch/mixed cases fail before the read-your-writes overlay consult, matching the account cases. Addresses review feedback on PR #1485.
|
Added transaction-namespace, |
…verse-map purge Add subtests exercising the transaction-namespace reverse-map purge path (distinct 8-byte txID + version key layout) and the same-batch overlay branches (value==nil already-deleted skip, committed+in-flight rewrite of the same key). The transaction same-batch/mixed cases fail before the read-your-writes overlay consult, matching the account cases. Addresses review feedback on PR #1485.
77a818f to
eb188b1
Compare
✅ Approve — automated reviewThe PR correctly extends reverse-map orphan purging to cover same-batch overlay writes during metadata field removal, addressing EN-1443. Both inline threads raised in prior discussion (checker/invariant-#8 gap and missing transaction-namespace test coverage) have been resolved and closed by the author in subsequent commits (77a818f). The single independent review found no introduced bugs or regressions. Patch coverage is slightly below threshold for some defensive error-path branches, which the author has acknowledged as effectively unreachable guard code. There are no remaining blocker or major issues. No findings. |
Problem
When an indexed metadata field is removed,
handleRemovedMetadataFieldTypepurges its read-store projection. The reverse-map (0x03) limb (purgeReverseMapForKey) scanned a committed-only Pebble snapshot. A reverse-map PUT written earlier in the same uncommitted builder batch (e.g. aSavedMetadataon the indexed field) was invisible to that snapshot, so no delete was emitted and the stale row committed atFlush()— an orphaned reverse-map row for a field that no longer exists.This is a read-index projection that diverges from a from-scratch replay, violating the "every persisted projection must be verifiable against the audit" invariant (CLAUDE.md #8).
The forward-index and entity-exists limbs are unaffected: their range tombstone gets a higher Pebble sequence number than earlier same-batch SETs, so it already shadows uncommitted rows. Only the reverse map — which cannot be range-deleted — relied on the snapshot alone.
Fix
purgeReverseMapForKeynow purges from both sources:rmapOverlay— the mechanism the codebase already maintains for exactly this ("callers resolve the current value viaReverseMapOverlaybefore falling back to committed state").dal.WriteSessiondeliberately exposes no iterator, so the overlay is the sanctioned way to read in-flight reverse-map writes.This keeps the live path's commit/progress boundaries intact (no mid-batch flush).
New
WriteBatchhelpers:DeleteReverseMapKey— deletes and clears the overlay entry so it never drifts from the batch.RangeReverseMapOverlay— read-only iteration over buffered reverse-map mutations.The two self-contradictory doc comments are corrected.
Testing
New
process_metadata_field_removal_test.go(table-driven):committed then removed,same batch repro,mixed committed and same batch,unrelated key survives. Verified the three same-batch cases fail before this change (orphan row survives) and all pass after.just pre-commit(0 lint issues) and the fulljust testunit suite are green.Out of scope
appliedProposalSync.err().compareIndexesverifies the registry, not reverse-map rows) — noted as a follow-up.Jira: https://formance-team.atlassian.net/browse/EN-1443