fix(db): avoid long flush stall on restart by EddieHouston · Pull Request #211 · Blockstream/electrs

EddieHouston · 2026-04-23T12:03:52Z

Summary

Fix freeze that can occur on restart of a mature electrs DB built from 91b883a or later,
and default to steady-state compaction triggers so restarts after initial sync get low read
amplification immediately.

enable_auto_compaction() was tightening level0_stop_writes_trigger from the bulk-load
value (512) to the RocksDB default (36) on every update() call, not just once. DB::open
applied bulk-load triggers at startup, so after any restart L0 can legitimately hold more
than 36 files. The reset instantly put the DB into pre-flush stall territory, and the
end-of-batch db.flush() parked inside WaitUntilFlushWouldNotStallWrites until background
compaction brought L0 below 36.

In testnet this caused over 1 hour of indexer freeze on restart.

Fix

DB::open() now defaults to steady-state triggers (RocksDB defaults: 4/20/36). After
opening, it checks the F sentinel key and calls apply_bulk_load_triggers() only when
F is absent (initial sync not yet complete). On normal restarts (F present), the DB
stays on tight triggers from the start.
apply_bulk_load_triggers() — new method that widens L0 triggers (64/256/512) and
disables pending-compaction-bytes stalls for initial sync.
apply_steady_state_triggers() — restores RocksDB defaults after full_compaction()
drains L0. Called once per DB lifetime inside the F sentinel gate.
enable_auto_compaction() — minimal flag set (disable_auto_compactions=false), safe
to call on every update().
Operator logging — DB::open and start_auto_compactions now log at info level which
trigger profile is active and whether full compaction runs or is skipped.

Test plan

cargo check clean
cargo test --lib — 8/8 pass, including new_index::db::tests::*
cargo test --test electrum — 4/4 pass
cargo test --test rest — 22/22 pass
Deploy to testnet and verify restart completes flush in under a second
Confirm Manual flush start → Manual flush finished in RocksDB LOG are within ms
Verify synced tip resumes advancing from ZMQ notifications immediately after restart

enable_auto_compaction() was lowering level0_stop_writes_trigger from the bulk-load value (512) to the RocksDB default (36) on every update() call. At DB open the bulk-load triggers are applied by DB::open, so on any restart L0 can legitimately hold more than 36 files. When the first post-restart update() called enable_auto_compaction(), the trigger tightening instantly put the DB into pre-flush stall territory, and the end-of-batch db.flush() that follows parked inside WaitUntilFlushWouldNotStallWrites waiting for background compaction to bring L0 below 36. On production testnet this reliably cost 77 minutes of indexer freeze per restart (verified by 'Manual flush start' → 'Manual flush finished' in the RocksDB LOG). The actual memtable flush took 62 ms once unblocked; the rest was wait. Split enable_auto_compaction() into the minimal flag-flip and a new apply_steady_state_triggers() that holds the L0 trigger / pending-bytes- limit reset. Invoke the latter exactly once per DB lifetime, inside the F-sentinel gate in start_auto_compactions(), immediately after full_compaction() has drained L0. On DBs where F is already set (steady- state restart), triggers stay at bulk-load values — the comment in DB::open already argues that configuration is fine for steady-state reads given the prefix bloom filters.

Open the DB with RocksDB-default L0 triggers (4/20/36) and only widen to bulk-load values (64/256/512) when the full-compaction sentinel 'F' is absent. Previously every restart used bulk-load triggers permanently, leaving read amplification unnecessarily high after initial sync. Add info-level logging to DB::open and start_auto_compactions so operators can see which trigger profile is active and whether full compaction runs or is skipped.

philippem · 2026-04-30T20:48:18Z

    fn start_auto_compactions(&self, db: &DB) {
        let key = b"F".to_vec();
        if db.get(&key).is_none() {
+            info!("full-compaction sentinel 'F' not found — running one-time full compaction and tightening triggers");


split this log message across the two following statements, as they are separate... full compaction happens then the triggers are tightening. there may be a significant time between them.

philippem · 2026-04-30T20:49:30Z

+            db.apply_steady_state_triggers();
            db.put_sync(&key, b"");
            assert!(db.get(&key).is_some());
+            info!("full-compaction sentinel 'F' set — future restarts will skip full compaction");


the log message should just say 'full compaction sentinel F set'. whether full compaction happens or not is outside the scope of this block

philippem · 2026-04-30T20:50:25Z


+    fn apply_bulk_load_triggers(&self) {
+        let opts = [
+            ("level0_file_num_compaction_trigger", "64"),


we've replaced the L0_BULK_TRIGGER with magic numbers, losing some context

philippem · 2026-04-30T21:01:14Z

the description is misleading because update() is called once on initial startup, not in the main loop. The steady state compaction parameters are applied once regardless of the number of L0 files that happen to be on disk.

The PR code no longer applies the steady state options that we had previously, which means we end up with defaults after a restart, not the steady state. They are written to an options file on disk but it needs to be explicitly loaded and reapplied.

https://github.com/facebook/rocksdb/wiki/RocksDB-Options-File

Are you able to repro L0 accumulation locally? I attempted by repeatedly killing electrs after startup but L0 did not increase (it was loading a full bitcoin mainnet).

EddieHouston requested review from DeviaVir, Randy808 and philippem April 23, 2026 12:03

EddieHouston self-assigned this Apr 23, 2026

EddieHouston force-pushed the fix/restart-flush-stall branch from d102699 to bef02e3 Compare April 23, 2026 12:10

philippem reviewed Apr 27, 2026

View reviewed changes

Comment thread src/new_index/schema.rs

philippem reviewed Apr 27, 2026

View reviewed changes

Comment thread src/new_index/db.rs Outdated

philippem reviewed Apr 27, 2026

View reviewed changes

Comment thread src/new_index/db.rs Outdated

EddieHouston added 2 commits April 28, 2026 14:49

EddieHouston force-pushed the fix/restart-flush-stall branch from 1ebdfd6 to 5e80660 Compare April 28, 2026 12:50

Randy808 approved these changes Apr 29, 2026

View reviewed changes

philippem reviewed Apr 30, 2026

View reviewed changes

philippem requested changes Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(db): avoid long flush stall on restart#211

fix(db): avoid long flush stall on restart#211
EddieHouston wants to merge 2 commits intoBlockstream:new-indexfrom
EddieHouston:fix/restart-flush-stall

EddieHouston commented Apr 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

philippem Apr 30, 2026

Uh oh!

philippem Apr 30, 2026

Uh oh!

philippem Apr 30, 2026

Uh oh!

philippem commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

EddieHouston commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

philippem Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

philippem Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

philippem Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

philippem commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EddieHouston commented Apr 23, 2026 •

edited

Loading