fix: per-node origin name translation in table diff and merkle tree#117
fix: per-node origin name translation in table diff and merkle tree#117mason-sharp merged 1 commit intomainfrom
Conversation
NodeOriginNames was loaded from one random pool and used to translate roidents from all nodes. In native PG, roidents are node-local, so the same roident maps to different subscription names on different nodes. This caused intermittent preserve-origin repair failures when the diff happened to load origin names from the wrong node. Change NodeOriginNames to map[string]map[string]string (nodeName → roident → name). Load from every pool. Pass the originating node name into withSpockMetadata so each row's roident is resolved correctly.
📝 WalkthroughWalkthroughThe changes refactor origin-name mappings in table and merkle tree diff tasks from a single global map to per-node nested maps, enabling node-specific origin lookups and improving failure tolerance when loading node-specific data. Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 7 |
| Duplication | -2 |
TIP This summary will be updated as you push new changes. Give us feedback
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/consistency/diff/table_diff.go (1)
225-274:⚠️ Potential issue | 🟠 MajorAvoid resolving
--against-originthrough a flattened node-local map.Line 244 collapses per-node origin IDs back into one ID, then Line 297 applies that single ID to every node. In native PostgreSQL, the same
roidentcan mean different origins per node, so an origin name can resolve from one node and then incorrectly include/exclude rows on another node where that ID means something else.Keep the resolved origin scoped per node and build node-specific filters for hashing, fetching, row-count estimation, and range generation; or fail fast when the requested origin cannot be resolved unambiguously across selected nodes. Also tighten
TestFlatNodeOriginNames_MergesAllNodesso conflicting flattened mappings are not accepted as valid behavior.Based on learnings,
GetNodeOriginNamesswitches between Spock node IDs and nativepg_replication_originIDs depending on Spock extension presence.Also applies to: 292-297
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/consistency/diff/table_diff.go` around lines 225 - 274, The current approach in flatNodeOriginNames() and resolveAgainstOrigin() incorrectly flattens NodeOriginNames into a single map and then sets a single resolvedAgainstOrigin ID for all nodes; instead preserve per-node resolution: change resolveAgainstOrigin() to iterate NodeOriginNames and produce a per-node map (e.g. resolvedAgainstOriginByNode) or return an error if the requested AgainstOrigin does not unambiguously resolve for every selected node, use those per-node IDs when building filters/hashes/fetches/range generation, and do not overwrite a single t.resolvedAgainstOrigin; also update TestFlatNodeOriginNames_MergesAllNodes to assert conflicts across nodes are rejected (or that per-node mappings are preserved) and review GetNodeOriginNames usage to ensure it supplies node-scoped IDs (Spock vs native pg_replication_origin) rather than relying on a flattened global mapping.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@internal/consistency/diff/table_diff.go`:
- Around line 225-274: The current approach in flatNodeOriginNames() and
resolveAgainstOrigin() incorrectly flattens NodeOriginNames into a single map
and then sets a single resolvedAgainstOrigin ID for all nodes; instead preserve
per-node resolution: change resolveAgainstOrigin() to iterate NodeOriginNames
and produce a per-node map (e.g. resolvedAgainstOriginByNode) or return an error
if the requested AgainstOrigin does not unambiguously resolve for every selected
node, use those per-node IDs when building filters/hashes/fetches/range
generation, and do not overwrite a single t.resolvedAgainstOrigin; also update
TestFlatNodeOriginNames_MergesAllNodes to assert conflicts across nodes are
rejected (or that per-node mappings are preserved) and review GetNodeOriginNames
usage to ensure it supplies node-scoped IDs (Spock vs native
pg_replication_origin) rather than relying on a flattened global mapping.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 59f2b269-28b7-405c-9c26-58187ba79e71
📒 Files selected for processing (4)
internal/consistency/diff/table_diff.gointernal/consistency/diff/table_diff_origin_test.gointernal/consistency/diff/table_rerun.gointernal/consistency/mtree/merkle.go
NodeOriginNames was loaded from one random pool and used to translate roidents from all nodes. In native PG, roidents are node-local, so the same roident maps to different subscription names on different nodes. This caused intermittent preserve-origin repair failures when the diff happened to load origin names from the wrong node.
Change NodeOriginNames to map[string]map[string]string (nodeName → roident → name). Load from every pool. Pass the originating node name into withSpockMetadata so each row's roident is resolved correctly.