Skip to content

Enforce unique indexes across all shards of a TRUNCATE time partition#6001

Open
dorinhogea wants to merge 1 commit into
bloomberg:mainfrom
dorinhogea:partuniqindex
Open

Enforce unique indexes across all shards of a TRUNCATE time partition#6001
dorinhogea wants to merge 1 commit into
bloomberg:mainfrom
dorinhogea:partuniqindex

Conversation

@dorinhogea

Copy link
Copy Markdown
Contributor

Time partitions spread data across multiple shard tables to implement data retention. Until now, unique indexes were only enforced within the shard being written, making it possible to insert the same key into different shards without conflict.

A new 'partition_unique' tunable enables cross-shard unique enforcement for TRUNCATE-rollout partitions. When on, any write that would violate a unique constraint in any sibling shard is rejected with the same error that a within-shard violation produces. The enforcement covers both inserts and updates that change a key column.

The feature carries a measurable write cost proportional to the number of sibling shards and unique indexes, driven by the extra index lookups per write and the larger BDB lock footprint per transaction. A performance test is included that quantifies both impacts separately.

Known limitations:

  • Enabling the tunable does not validate pre-existing data; cross-shard duplicates already present are the user's responsibility to resolve.
  • UPSERT is not supported on time partitions.

@roborivers roborivers left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
sc_truncate_multiddl_generated [db unavailable at finish] **quarantined**
sc_truncate [db unavailable at finish]
consumer_non_atomic_default_consumer_generated **quarantined**
remotecreate_twopc_generated
remotecreate
tunables
reco-ddlk-sql [timeout] **quarantined**

@roborivers roborivers left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Error ⚠.
Regression testing: Success ✓.

The first 10 failing tests are:
sc_timepart **quarantined**
consumer_non_atomic_default_consumer_generated **quarantined**
tunables
reco-ddlk-sql [timeout] **quarantined**

@roborivers roborivers left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Error ⚠.
Regression testing: Success ✓.

The first 10 failing tests are:
sc_resume_logicalsc_generated **quarantined**
sp_snapshot_generated
consumer_non_atomic_default_consumer_generated **quarantined**
reco-ddlk-sql [timeout] **quarantined**
partition_unique_perf [timeout]

@dorinhogea dorinhogea force-pushed the partuniqindex branch 2 times, most recently from 578ec22 to 4464b12 Compare June 8, 2026 19:01

@roborivers roborivers left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
sc_resume_logicalsc_generated **quarantined**
consumer_non_atomic_default_consumer_generated **quarantined**
reco-ddlk-sql [timeout] **quarantined**

Time partitions spread data across multiple shard tables to implement
data retention. Until now, unique indexes were only enforced within the
shard being written, making it possible to insert the same key into
different shards without conflict.

A new 'partition_unique' tunable enables cross-shard unique enforcement
for TRUNCATE-rollout partitions. When on, any write that would violate a
unique constraint in any sibling shard is rejected with the same error
that a within-shard violation produces. The enforcement covers inserts
and updates that change a key column.

The feature carries a measurable write cost proportional to the number
of sibling shards and unique indexes, driven by the extra index lookups
per write and the larger BDB lock footprint per transaction.

A new 'partition_unique_debug' tunable traces the full lifecycle:
OSQL_PARTITION_SHARDS send on the replicant, receive/store and free on
the master, and each cross-shard index probe.

Known limitations:
- Enabling the tunable does not validate pre-existing data.
- UPSERT is not supported on time partitions when enabled (enforced at
  write time with a clear error).
- ON UPDATE CASCADE is handled master-side in constraints.c without
  going through the OSQL stream, so cross-shard enforcement does not
  apply to cascaded key changes until that path is updated.

Bug fixes applied during review:
- Endian-safe nshards in OSQL_PARTITION_SHARDS wire format (htonl/ntohl)
- Bounds-check nshards and validate shard name NUL terminators in receiver
- Fail writes on malloc failure in timepart_get_shard_names via errstat
- Remove redundant forward declaration of check_cross_shard_unique

Refactoring:
- Consolidate 7 copy-pasted reqerrstr blocks into reqerrstr_dup_key /
  reqerrstr_uncommittable_dup helpers in indices.c

Tests: add partition_unique_check correctness test and
partition_unique_perf performance test.

Signed-off-by: Dorin Hogea <dhogea@bloomberg.net>

@roborivers roborivers left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Error ⚠.
Regression testing: 0/0 tests failed ⚠.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants