Skip to content

Fix #1711 - DDLWorker: don't recreate replica dirs on every main-loop tick#1712

Open
CarlosFelipeOR wants to merge 2 commits intoantalya-25.8from
try-fix/antalya-25.8/1711-ddlworker-race
Open

Fix #1711 - DDLWorker: don't recreate replica dirs on every main-loop tick#1712
CarlosFelipeOR wants to merge 2 commits intoantalya-25.8from
try-fix/antalya-25.8/1711-ddlworker-race

Conversation

@CarlosFelipeOR
Copy link
Copy Markdown
Collaborator

Fix #1711

Test PR. The fix was authored with assistance from AI model Claude Opus 4.7.

After upstream PR ClickHouse#92339, DDLWorker::markReplicasActive() unconditionally re-creates /clickhouse/task_queue/replicas/<host_id> on every iteration of runMainThread, racing with external recursive cleanup of /clickhouse and deterministically breaking test_replication_without_zookeeper::test_startup_without_zookeeper on antalya-25.8 since 25.8.21 (PR #1600).

This change adds a should_create_dirs parameter to markReplicasActive and only calls createReplicaDirs from real events (initialization/reconnect, or when a new host ID was just observed via host_ids_updated), not on every tick. This restores the pre-ClickHouse#92339 invariant ("create host dirs once at startup/reconnect") while preserving the interserver-IO fix from ClickHouse#92339.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix race in DDLWorker that broke recursive deletion of the DDL queue path in ZooKeeper.

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

Made with Cursor

After upstream PR ClickHouse#92339,
DDLWorker::markReplicasActive() unconditionally re-creates
/clickhouse/task_queue/replicas/<host_id> on every iteration of
runMainThread, racing with external recursive cleanup of /clickhouse and
deterministically breaking
test_replication_without_zookeeper::test_startup_without_zookeeper on
antalya-25.8 since 25.8.21 (PR #1600).

Add a `should_create_dirs` parameter to markReplicasActive and only call
createReplicaDirs from real events (initialization/reconnect, or when a
new host ID was just observed via host_ids_updated), not on every tick.
This restores the pre-ClickHouse#92339 invariant ("create host dirs once at
startup/reconnect") while preserving the interserver-IO fix.

Generated with assistance from AI model Claude Opus 4.7.

See #1711.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 30, 2026

Workflow [PR], commit [98ddd66]

@svb-alt svb-alt requested a review from DimensionWieldr April 30, 2026 15:20
@CarlosFelipeOR CarlosFelipeOR changed the title Don't Merge - Try to fix #1711 - DDLWorker: don't recreate replica dirs on every main-loop tick Fix #1711 - DDLWorker: don't recreate replica dirs on every main-loop tick Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants