Skip to content

fix(EN-1443): purge same-batch reverse-map orphans on metadata field removal#1485

Merged
paul-nicolas merged 2 commits into
release/v3.0from
EN-1443-reverse-map-orphan-purge
Jul 3, 2026
Merged

fix(EN-1443): purge same-batch reverse-map orphans on metadata field removal#1485
paul-nicolas merged 2 commits into
release/v3.0from
EN-1443-reverse-map-orphan-purge

Conversation

@paul-nicolas

Copy link
Copy Markdown
Contributor

Problem

When an indexed metadata field is removed, handleRemovedMetadataFieldType purges its read-store projection. The reverse-map (0x03) limb (purgeReverseMapForKey) scanned a committed-only Pebble snapshot. A reverse-map PUT written earlier in the same uncommitted builder batch (e.g. a SavedMetadata on the indexed field) was invisible to that snapshot, so no delete was emitted and the stale row committed at Flush() — an orphaned reverse-map row for a field that no longer exists.

This is a read-index projection that diverges from a from-scratch replay, violating the "every persisted projection must be verifiable against the audit" invariant (CLAUDE.md #8).

The forward-index and entity-exists limbs are unaffected: their range tombstone gets a higher Pebble sequence number than earlier same-batch SETs, so it already shadows uncommitted rows. Only the reverse map — which cannot be range-deleted — relied on the snapshot alone.

Fix

purgeReverseMapForKey now purges from both sources:

  • committed rows via the snapshot scan (unchanged), and
  • same-batch uncommitted PUTs via the batch's read-your-writes rmapOverlay — the mechanism the codebase already maintains for exactly this ("callers resolve the current value via ReverseMapOverlay before falling back to committed state"). dal.WriteSession deliberately exposes no iterator, so the overlay is the sanctioned way to read in-flight reverse-map writes.

This keeps the live path's commit/progress boundaries intact (no mid-batch flush).

New WriteBatch helpers:

  • DeleteReverseMapKey — deletes and clears the overlay entry so it never drifts from the batch.
  • RangeReverseMapOverlay — read-only iteration over buffered reverse-map mutations.

The two self-contradictory doc comments are corrected.

Testing

New process_metadata_field_removal_test.go (table-driven): committed then removed, same batch repro, mixed committed and same batch, unrelated key survives. Verified the three same-batch cases fail before this change (orphan row survives) and all pass after. just pre-commit (0 lint issues) and the full just test unit suite are green.

Out of scope

  • The "empty transient-set persisted" limb — already cancelled on every commit path via appliedProposalSync.err().
  • A checker-side pass detecting reverse-map orphans (compareIndexes verifies the registry, not reverse-map rows) — noted as a follow-up.

Jira: https://formance-team.atlassian.net/browse/EN-1443

…removal

handleRemovedMetadataFieldType purged the reverse-map (0x03) limb of a
removed indexed metadata field via a committed-only Pebble snapshot scan.
A reverse-map PUT written earlier in the same uncommitted builder batch
(e.g. a SavedMetadata on the indexed field) was invisible to that
snapshot, so no delete was emitted and the stale row committed at Flush()
as an orphaned reverse-map row for a field that no longer exists — a
read-index projection diverging from a from-scratch replay (CLAUDE.md #8).

Unlike the forward/entity-exists limbs, whose range tombstone gets a
higher sequence number than earlier same-batch SETs and so shadows
uncommitted rows, the reverse map cannot be range-deleted and relied on
the snapshot alone.

purgeReverseMapForKey now also consults the batch's read-your-writes
rmapOverlay — the mechanism the codebase already maintains for exactly
this — deleting same-batch PUTs for the removed key. Adds
WriteBatch.DeleteReverseMapKey (delete + keep the overlay honest) and
RangeReverseMapOverlay. Corrects the two self-contradictory doc comments.

Regression test covers committed-then-removed, same-batch (the repro),
mixed, and unrelated-key-survives; the three same-batch cases fail before
this change.

https://formance-team.atlassian.net/browse/EN-1443
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (1)
  • main

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aa602173-13d3-4add-9337-45e26e47c631

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch EN-1443-reverse-map-orphan-purge

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 67.85714% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.96%. Comparing base (5ddbe29) to head (eb188b1).
⚠️ Report is 14 commits behind head on release/v3.0.

Files with missing lines Patch % Lines
...ion/indexbuilder/process_metadata_field_removal.go 65.00% 3 Missing and 4 partials ⚠️
internal/storage/readstore/write_batch.go 75.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@               Coverage Diff                @@
##           release/v3.0    #1485      +/-   ##
================================================
+ Coverage         73.22%   73.96%   +0.74%     
================================================
  Files               388      390       +2     
  Lines             39765    39776      +11     
================================================
+ Hits              29117    29420     +303     
+ Misses             7900     7796     -104     
+ Partials           2748     2560     -188     
Flag Coverage Δ
e2e 73.96% <67.85%> (+0.74%) ⬆️
scenario 73.96% <67.85%> (+0.74%) ⬆️
unit 73.96% <67.85%> (+0.74%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@paul-nicolas paul-nicolas left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-model review — PR #1485 (EN-1443 reverse-map orphan purge)

Models: Claude (this repo's perspective) + Codex (gpt-5.4, external). Gemini not used.

Verdict: the fix is correct and sound — no blocking issues. I verified independently: go build + go vet on both changed packages are clean; the new TestHandleRemovedMetadataFieldType_PurgesReverseMap passes on the PR head, and the three same-batch subtests genuinely fail on the base branch (orphan row survives, count=1) — the test reproduces the bug it claims to. The overlay stays consistent with the batch (all reverse-map writers — ReplaceMetadataIndexV, DeleteMetadataEntryWithPreviousV, and the new DeleteReverseMapKey — mirror into rmapOverlay). The Go-map iteration order in RangeReverseMapOverlay is a non-issue: the read-side index is a per-replica projection off the FSM hot path, and the purge is an order-independent delete-set, so the committed result is identical regardless of iteration order.

Severity counts: 0 Critical, 0 Major, 2 Medium (posted inline), 4 Minor/Nit (listed below, not posted inline).

Conflicts reconciled between the two reviewers:

  • Checker-coverage gap: Codex rated Major, Claude rated Medium → resolved to Medium. It is real per invariant #8 but pre-existing, explicitly deferred in this PR's own description, and documented as a separately-tracked effort at internal/application/check/checker.go:388-397. This PR neither introduces nor worsens it.
  • O(M·N) overlay scan: Codex rated Medium, Claude rated Minor → resolved to Minor. M (fields removed per batch) is typically 1, the committed-scan half was already O(M·rows) pre-PR, and the overlay half is bounded by batch write count.
  • Transaction-namespace test gap: Codex rated Minor, Claude rated Medium → resolved to Medium (both flagged it; distinct key-layout parsing path, trivially addable).

Minor / Nit (not posted inline)

  • internal/application/indexbuilder/process_metadata_field_removal.go:182 — [Minor] Per-call full-overlay scan is O(M·N) when many fields are removed in one batch; acceptable as-is (matches the documented tradeoff in gc.go), revisit only if batched multi-field removals become common.
  • internal/application/indexbuilder/process_metadata_field_removal.go:141 — [Nit] The seen dedup set is effectively dead code: the committed pass's DeleteReverseMapKey sets rmapOverlay[k]=nil before the overlay range runs, so any shared key hits the value==nil guard and never re-reaches deleteMatch. Harmless, but the comment ("deleted only once") overstates its necessity — either drop it or reword as purely defensive.
  • internal/storage/readstore/write_batch.go:19 — [Minor] The struct comment's claim that an un-mirrored reverse-map write is "unrepresentable" is overstated: internal/application/indexbuilder/gc.go:84 deletes reverse-map rows via raw batch.DeleteKey, bypassing the overlay (safe only because it runs boot-only). Scope the guarantee to the live write helpers.
  • internal/application/indexbuilder/process_metadata_field_removal_test.go:125 — [Minor] Two overlay branches are unexercised: the value==nil path (a same-batch write-then-delete-then-remove) and a true committed+overlay dedup collision (the "mixed" subtest uses distinct entities acct-1/acct-2, so no key is ever in both committed state and a non-nil overlay entry).

Comment thread internal/application/indexbuilder/process_metadata_field_removal.go
paul-nicolas added a commit that referenced this pull request Jul 3, 2026
…verse-map purge

Add subtests exercising the transaction-namespace reverse-map purge path
(distinct 8-byte txID + version key layout) and the same-batch overlay
branches (value==nil already-deleted skip, committed+in-flight rewrite of
the same key). The transaction same-batch/mixed cases fail before the
read-your-writes overlay consult, matching the account cases.

Addresses review feedback on PR #1485.
@paul-nicolas

Copy link
Copy Markdown
Contributor Author

Added transaction-namespace, value==nil overlay-skip, and committed+in-flight rewrite subtests in 77a818f, which cover the new overlay branches flagged here. The residual uncovered lines are defensive error/guard paths (iterator/DeleteKey error returns, cross-namespace HasPrefix skip, and the effectively-unreachable seen dedup skip); overall project coverage nets +0.89%.

…verse-map purge

Add subtests exercising the transaction-namespace reverse-map purge path
(distinct 8-byte txID + version key layout) and the same-batch overlay
branches (value==nil already-deleted skip, committed+in-flight rewrite of
the same key). The transaction same-batch/mixed cases fail before the
read-your-writes overlay consult, matching the account cases.

Addresses review feedback on PR #1485.
@paul-nicolas paul-nicolas force-pushed the EN-1443-reverse-map-orphan-purge branch from 77a818f to eb188b1 Compare July 3, 2026 08:24
@paul-nicolas paul-nicolas marked this pull request as ready for review July 3, 2026 11:44
@NumaryBot

Copy link
Copy Markdown
Contributor

✅ Approve — automated review

The PR correctly extends reverse-map orphan purging to cover same-batch overlay writes during metadata field removal, addressing EN-1443. Both inline threads raised in prior discussion (checker/invariant-#8 gap and missing transaction-namespace test coverage) have been resolved and closed by the author in subsequent commits (77a818f). The single independent review found no introduced bugs or regressions. Patch coverage is slightly below threshold for some defensive error-path branches, which the author has acknowledged as effectively unreachable guard code. There are no remaining blocker or major issues.

No findings.

@gfyrag gfyrag left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — le fix réutilise l'overlay rmap déjà maintenu, ce qui reste cohérent avec l'invariant #3 (WriteSession write-only) et évite l'overhead d'un IndexedBatch. Tests bien ciblés.

@paul-nicolas paul-nicolas merged commit 2f53d97 into release/v3.0 Jul 3, 2026
13 of 14 checks passed
@paul-nicolas paul-nicolas deleted the EN-1443-reverse-map-orphan-purge branch July 3, 2026 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants