fix(db): avoid long flush stall on restart#211
fix(db): avoid long flush stall on restart#211EddieHouston wants to merge 2 commits intoBlockstream:new-indexfrom
Conversation
d102699 to
bef02e3
Compare
enable_auto_compaction() was lowering level0_stop_writes_trigger from the bulk-load value (512) to the RocksDB default (36) on every update() call. At DB open the bulk-load triggers are applied by DB::open, so on any restart L0 can legitimately hold more than 36 files. When the first post-restart update() called enable_auto_compaction(), the trigger tightening instantly put the DB into pre-flush stall territory, and the end-of-batch db.flush() that follows parked inside WaitUntilFlushWouldNotStallWrites waiting for background compaction to bring L0 below 36. On production testnet this reliably cost 77 minutes of indexer freeze per restart (verified by 'Manual flush start' → 'Manual flush finished' in the RocksDB LOG). The actual memtable flush took 62 ms once unblocked; the rest was wait. Split enable_auto_compaction() into the minimal flag-flip and a new apply_steady_state_triggers() that holds the L0 trigger / pending-bytes- limit reset. Invoke the latter exactly once per DB lifetime, inside the F-sentinel gate in start_auto_compactions(), immediately after full_compaction() has drained L0. On DBs where F is already set (steady- state restart), triggers stay at bulk-load values — the comment in DB::open already argues that configuration is fine for steady-state reads given the prefix bloom filters.
Open the DB with RocksDB-default L0 triggers (4/20/36) and only widen to bulk-load values (64/256/512) when the full-compaction sentinel 'F' is absent. Previously every restart used bulk-load triggers permanently, leaving read amplification unnecessarily high after initial sync. Add info-level logging to DB::open and start_auto_compactions so operators can see which trigger profile is active and whether full compaction runs or is skipped.
1ebdfd6 to
5e80660
Compare
| fn start_auto_compactions(&self, db: &DB) { | ||
| let key = b"F".to_vec(); | ||
| if db.get(&key).is_none() { | ||
| info!("full-compaction sentinel 'F' not found — running one-time full compaction and tightening triggers"); |
There was a problem hiding this comment.
split this log message across the two following statements, as they are separate... full compaction happens then the triggers are tightening. there may be a significant time between them.
| db.apply_steady_state_triggers(); | ||
| db.put_sync(&key, b""); | ||
| assert!(db.get(&key).is_some()); | ||
| info!("full-compaction sentinel 'F' set — future restarts will skip full compaction"); |
There was a problem hiding this comment.
the log message should just say 'full compaction sentinel F set'. whether full compaction happens or not is outside the scope of this block
|
|
||
| fn apply_bulk_load_triggers(&self) { | ||
| let opts = [ | ||
| ("level0_file_num_compaction_trigger", "64"), |
There was a problem hiding this comment.
we've replaced the L0_BULK_TRIGGER with magic numbers, losing some context
|
the description is misleading because update() is called once on initial startup, not in the main loop. The steady state compaction parameters are applied once regardless of the number of L0 files that happen to be on disk. The PR code no longer applies the steady state options that we had previously, which means we end up with defaults after a restart, not the steady state. They are written to an options file on disk but it needs to be explicitly loaded and reapplied. https://github.com/facebook/rocksdb/wiki/RocksDB-Options-File Are you able to repro L0 accumulation locally? I attempted by repeatedly killing electrs after startup but L0 did not increase (it was loading a full bitcoin mainnet). |
Summary
Fix freeze that can occur on restart of a mature electrs DB built from
91b883aor later,and default to steady-state compaction triggers so restarts after initial sync get low read
amplification immediately.
enable_auto_compaction()was tighteninglevel0_stop_writes_triggerfrom the bulk-loadvalue (512) to the RocksDB default (36) on every
update()call, not just once.DB::openapplied bulk-load triggers at startup, so after any restart L0 can legitimately hold more
than 36 files. The reset instantly put the DB into pre-flush stall territory, and the
end-of-batch
db.flush()parked insideWaitUntilFlushWouldNotStallWritesuntil backgroundcompaction brought L0 below 36.
In testnet this caused over 1 hour of indexer freeze on restart.
Fix
DB::open()now defaults to steady-state triggers (RocksDB defaults: 4/20/36). Afteropening, it checks the
Fsentinel key and callsapply_bulk_load_triggers()only whenFis absent (initial sync not yet complete). On normal restarts (F present), the DBstays on tight triggers from the start.
apply_bulk_load_triggers()— new method that widens L0 triggers (64/256/512) anddisables pending-compaction-bytes stalls for initial sync.
apply_steady_state_triggers()— restores RocksDB defaults afterfull_compaction()drains L0. Called once per DB lifetime inside the
Fsentinel gate.enable_auto_compaction()— minimal flag set (disable_auto_compactions=false), safeto call on every
update().Operator logging —
DB::openandstart_auto_compactionsnow log at info level whichtrigger profile is active and whether full compaction runs or is skipped.
Test plan
cargo checkcleancargo test --lib— 8/8 pass, includingnew_index::db::tests::*cargo test --test electrum— 4/4 passcargo test --test rest— 22/22 passManual flush start→Manual flush finishedin RocksDB LOG are within ms