Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
6bb5ab4
feat: add mix tasks for block processing benchmarks
MegaRedHand Mar 6, 2026
5b0f1a7
docs: add benchmarking doc
MegaRedHand Mar 6, 2026
6990ac3
perf: incremental merkleization with per-field caching
MegaRedHand Mar 10, 2026
2af74ea
perf: defer SSZ encoding to async DB persistence
MegaRedHand Mar 10, 2026
f83bdda
perf: skip redundant BLS verification for block attestations
MegaRedHand Mar 10, 2026
12d1834
perf: eliminate filter step in justification balance computation
MegaRedHand Mar 10, 2026
d755296
fix: skip validator hash cache when block modifies validators
MegaRedHand Mar 10, 2026
36502e1
perf: expand merkle field caching for non-epoch blocks
MegaRedHand Mar 10, 2026
e318f4d
fix: resolve credo strict violations introduced by perf commits
MegaRedHand Mar 10, 2026
696ced0
fix: update function specs and delete unused function
MegaRedHand Mar 10, 2026
653f250
perf: cache sync committee indices to avoid 2.2M validator scan per b…
MegaRedHand Mar 10, 2026
f9c5e48
perf: single-scan pubkey resolution for pending deposits
MegaRedHand Mar 10, 2026
5ea7fc3
fix: resolve credo strict violations introduced by perf commits
MegaRedHand Mar 10, 2026
1ad3216
fix: handle missing start key in fold_keys to fix pruning failures
MegaRedHand Mar 10, 2026
25a2e28
fix: treat data-not-available as transient error instead of permanent…
MegaRedHand Mar 10, 2026
2073aee
fix: recover invalid blocks on startup and treat data-not-available a…
MegaRedHand Mar 10, 2026
94a0f22
fix: schedule column retry after recovering invalid blocks on startup
MegaRedHand Mar 10, 2026
1c30955
fix: move blocks with all columns present to pending during retry
MegaRedHand Mar 10, 2026
46a2d21
fix: batch retry_download_columns to prevent OOM during catch-up sync
MegaRedHand Mar 10, 2026
c4e97f2
perf: pass in-memory store to status request handlers
MegaRedHand Mar 10, 2026
0936581
fix: handle missing finalized root in update_tree gracefully
MegaRedHand Mar 10, 2026
80841e2
fix: handle missing unrealized_justifications in head selection after…
MegaRedHand Mar 11, 2026
886d428
fix: purge corrupted data columns instead of cascade-invalidating blocks
MegaRedHand Mar 13, 2026
cee6844
fix: return accumulator in StateDb pruning error branch
MegaRedHand Mar 13, 2026
8e7d839
fix: repair tree cache when parent chain is missing after finalization
MegaRedHand Mar 13, 2026
d670687
fix: reduce BlockStates LRU cache from 128 to 16 entries to prevent OOM
MegaRedHand Mar 13, 2026
a4f9ae8
fix: handle pruned blocks in get_ancestor to prevent GenServer crash
MegaRedHand Mar 13, 2026
b2c0cbc
fix: treat "block is from the future" as transient error
MegaRedHand Mar 13, 2026
629b9f4
fix: add load shedding to Libp2pPort to prevent message queue OOM
MegaRedHand Mar 13, 2026
77c809f
fix: exempt new_peer messages from load shedding for PeerDAS routing
MegaRedHand Mar 13, 2026
47d23b5
fix: batch process_blocks to prevent message queue buildup during cat…
MegaRedHand Mar 13, 2026
4323ca3
perf: eliminate synchronous LevelDB stalls blocking Libp2pPort
MegaRedHand Mar 13, 2026
b691482
perf: skip LMD-GHOST during catch-up and use non-blocking ETS cache i…
MegaRedHand Mar 13, 2026
b490c69
fix: auto-resync when still behind after sync batch completes
MegaRedHand Mar 13, 2026
abd653b
fix: skip processing blocks behind head to prevent 12-minute stalls
MegaRedHand Mar 13, 2026
f45385c
perf: skip prefetch_states and attestations during catch-up sync
MegaRedHand Mar 13, 2026
9aafe37
perf: tighten catch-up threshold from 1 epoch to 4 slots
MegaRedHand Mar 13, 2026
7bdd37d
fix: propagate store updates from data column/blob response handlers
MegaRedHand Mar 13, 2026
a1c1fca
fix: repair broken process_registry_updates and remove duplicate helpers
MegaRedHand Mar 13, 2026
707a2ca
refactor: fix credo errors
MegaRedHand Mar 13, 2026
66b2c5f
perf: optimize epoch rewards/penalties with single-pass index sets
MegaRedHand Mar 15, 2026
0c1dd37
perf: prefetch beacon committees before block operations
MegaRedHand Mar 15, 2026
83fdc35
perf: direct indexed withdrawal sweep replacing Stream.cycle/drop/take
MegaRedHand Mar 15, 2026
cfbf1a1
perf: inline participation check in inactivity score updates
MegaRedHand Mar 15, 2026
4663760
perf: fuse rewards/penalties into 2-pass computation (was ~9 passes)
MegaRedHand Mar 15, 2026
45ebae5
perf: use :atomics for O(1) shuffle swaps instead of Aja.Vector O(log N)
MegaRedHand Mar 15, 2026
e991a0e
perf: short-circuit process_slashings + add timing instrumentation
MegaRedHand Mar 15, 2026
7b98350
perf: skip compute_pulled_up_tip during catch-up sync
MegaRedHand Mar 15, 2026
7542a9b
perf: skip ETS state insert and LevelDB write during catch-up sync
MegaRedHand Mar 15, 2026
f36dffd
perf: incremental merkle cache for balances, participation, and randao
MegaRedHand Mar 15, 2026
b36f209
perf: list-based withdrawal sweep replacing per-index Aja.Vector.at!
MegaRedHand Mar 15, 2026
2b265e6
perf: pre-warm committee cache in benchmark for steady-state simulation
MegaRedHand Mar 15, 2026
7c3f081
perf: move committee shuffle to Rust NIF
MegaRedHand Mar 15, 2026
2c73288
perf: deduplicate get_attesting_indices in attestation processing
MegaRedHand Mar 15, 2026
a1f3a06
perf: move proposer index computation to Rust NIF
MegaRedHand Mar 15, 2026
34a8d91
perf: force GC before epoch processing to eliminate GC variance
MegaRedHand Mar 15, 2026
5407b4b
fix: add fallback for incremental cache mismatches in state root veri…
MegaRedHand Mar 16, 2026
2c98e10
fix: treat parent-state-not-found as transient error to prevent casca…
MegaRedHand Mar 20, 2026
8c9f7f2
fix: add retry limit for parent-state-not-found to prevent spin loop
MegaRedHand Mar 20, 2026
0d6163a
fix: always persist states to ETS during catch-up to prevent state loss
MegaRedHand Mar 20, 2026
8709a75
fix: prevent parent state eviction and cross-check proposer NIF
MegaRedHand Mar 20, 2026
0fa1b33
fix: skip incremental balance cache for withdrawal/consolidation requ…
MegaRedHand Mar 20, 2026
a1aeaa7
fix: use next_epoch (current+1) in pending deposit withdrawn check
MegaRedHand Mar 20, 2026
2d44fe2
feat: add utility functions to track memory usage
MegaRedHand Mar 22, 2026
1deb236
fix: always write states to LevelDB to survive ETS cache eviction
MegaRedHand Mar 26, 2026
901434c
fix: correct ETS cache cleanup match spec and add eviction to Cache.set
MegaRedHand Mar 26, 2026
5d11893
fix: make StoreDb.persist_store async and skip during catch-up sync
MegaRedHand Mar 30, 2026
66866c1
fix: make StoreDb.persist_store async without deep-copying Store struct
MegaRedHand Mar 30, 2026
6a7744b
fix: improve PeerDAS column download to prevent stalling
MegaRedHand Mar 30, 2026
0dedd97
fix: add hard peer cap and aggressive pruning to prevent Libp2pPort o…
MegaRedHand Mar 30, 2026
0eefa0a
feat: expose peerbook peer count via metric
MegaRedHand Mar 30, 2026
c8e3fc6
fix: add Go libp2p ConnectionManager to bound peer connections
MegaRedHand Apr 15, 2026
5da2181
fix: reduce state cache max_entries from 16 to 10 for mainnet
MegaRedHand Apr 15, 2026
1792317
fix: prevent prefetch_states from blocking ForkChoice with LevelDB reads
MegaRedHand Apr 15, 2026
402208a
fix: always touch parent state in ETS to prevent LevelDB fallback stalls
MegaRedHand Apr 15, 2026
d2a842c
fix: reduce LevelDB write pressure by persisting every 4th block state
MegaRedHand Apr 15, 2026
eb822ca
fix: minimize LevelDB state persistence to epoch boundaries only
MegaRedHand Apr 15, 2026
4446558
fix: use cache-only parent state lookup in on_block to prevent LevelD…
MegaRedHand Apr 15, 2026
e1a59ab
fix: use cache-only state lookups in head computation to prevent Leve…
MegaRedHand Apr 15, 2026
73e3701
fix: use cache-only state lookups in on_attestation and on_attester_s…
MegaRedHand Apr 15, 2026
60aa569
fix: cache-only block lookup in attestation validation
MegaRedHand Apr 15, 2026
13c380d
fix: systematic cache-only block lookups across fork choice hot path
MegaRedHand Apr 15, 2026
f7d9ded
fix: complete cache-only conversion for all remaining LevelDB hot paths
MegaRedHand Apr 15, 2026
ad37cf0
fix: offload BlocksByRange LevelDB reads and cache-only BlocksByRoot
MegaRedHand Apr 15, 2026
8404b0d
fix: cache-only block lookups in PendingBlocks and IncomingRequestsHa…
MegaRedHand Apr 15, 2026
6577384
fix: treat node as catching_up when store.head_slot is far behind
MegaRedHand Apr 20, 2026
3abb428
fix: drop new_peer events under load to prevent Libp2pPort stalls
MegaRedHand Apr 20, 2026
ba6f1f4
fix: send :ignore validation for dropped gossip to prevent goroutine …
MegaRedHand Apr 20, 2026
e71cfca
fix(metrics): raise prometheus scrape_interval to 15s on mainnet
MegaRedHand Apr 20, 2026
e44d40a
chore: fmt
MegaRedHand Apr 20, 2026
9f942f0
fix: stop Libp2pPort crash-loop on empty Peerbook
MegaRedHand Apr 21, 2026
5113f9a
revert: remove Libp2pPort load-shedding mechanism
MegaRedHand Apr 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,6 @@ callgrind.out.*
# beacon node oapi json file
beacon-node-oapi.json
flamegraphs/

# benchmark data
/bench/data/
12 changes: 12 additions & 0 deletions .iex.exs
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,15 @@ block_info = fn "0x"<>root -> root |> Base.decode16(case: :lower) |> elem(1) |>

blocks_by_status = fn status -> Blocks.get_blocks_with_status(status) |> elem(1) end
blocks_by_status_count = fn status -> blocks_by_status.(status) |> Enum.count() end

# Memory introspection (see lib/utils/mem.ex)
alias LambdaEthereumConsensus.Mem
# Quick access:
# Mem.report() — full memory report
# Mem.ets_tables() — all ETS tables ranked by memory
# Mem.top_processes(10) — top 10 processes by heap
# Mem.state_cache_detail() — per-entry BlockStates breakdown
# Mem.checkpoint_detail() — checkpoint states table
# Mem.binary_stats() — binary/refc binary pressure
# Mem.cache_tables() — StateTransition cache sizes
# snap = Mem.snapshot(); ...; Mem.diff_snapshot(snap) — delta tracking
127 changes: 127 additions & 0 deletions docs/perf/benchmarking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Block Processing Benchmarks

Reproducible benchmarks for measuring block processing performance on real mainnet/testnet data. The workflow has two steps: download data from a Beacon API node, then replay blocks through the `ForkChoice.process_block` pipeline offline.

## Quick Start

```bash
# 1. Download 2 epochs (64 slots) from a Fulu-compatible node
mix bench.download \
--url http://localhost:5052 \
--start-slot 9649056 \
--count 64

# 2. Run the benchmark
mix bench.blocks --data-dir bench/data/slot_9649056_64
```

## Step 1: Download Data

`mix bench.download` fetches state, blocks, and blob sidecars from a Beacon API, converts blobs to data columns (via KZG cell computation), and saves everything to disk.

### Options

| Flag | Required | Default | Description |
|------|----------|---------|-------------|
| `--url` | yes | | Beacon API base URL (e.g. `http://localhost:5052`) |
| `--start-slot` | yes | | Slot to anchor from (should be an epoch boundary) |
| `--count` | yes | | Number of slots after start to fetch |
| `--data-dir` | no | `bench/data` | Base directory for output |
| `--network` | no | `mainnet` | Network config (mainnet, sepolia, holesky, etc.) |

### Choosing a Start Slot

Pick a slot that is an **epoch boundary** (divisible by 32). This ensures the anchor state is at the start of an epoch, which is the natural checkpoint alignment for the forkchoice store. The task warns if the slot is not aligned.

To find a recent finalized epoch boundary:

```bash
# Query finalized slot from your beacon node
curl -s http://localhost:5052/eth/v1/beacon/headers/finalized | jq '.data.header.message.slot'
# Round down to epoch boundary: slot - (slot % 32)
```

### Output Structure

```
bench/data/slot_<start>_<count>/
metadata.json # Download parameters + timestamp + network
state.ssz_snappy # Anchor state (BeaconState) at start-slot
block_<slot>.ssz_snappy # Anchor block + all non-empty blocks in range
columns_<slot>/ # Data columns per block (Fulu, only if block has blobs)
column_<index>.ssz_snappy
```

Missing block files mean the slot was empty (no block proposed). This is normal; mainnet typically has ~1-3% empty slots.

### Requirements

The Beacon API node must:
- Serve the `/eth/v2/debug/beacon/states/{slot}` endpoint (SSZ)
- Serve the `/eth/v2/beacon/blocks/{slot}` endpoint (SSZ)
- Serve the `/eth/v1/beacon/blob_sidecars/{slot}` endpoint (JSON)
- Have state and blocks available for the requested slot range (not pruned)
- Be on the same fork as the compiled `.fork_version` (currently Fulu)

## Step 2: Process Blocks

`mix bench.blocks` loads cached data from disk, boots the necessary infrastructure (LevelDB, ETS caches, mocked execution engine), and replays blocks through the full `ForkChoice.process_block` pipeline.

### Options

| Flag | Required | Default | Description |
|------|----------|---------|-------------|
| `--data-dir` | yes | | Path to a downloaded dataset directory |
| `--log-level` | no | `info` | Logger level (`debug`, `info`, `warning`, `error`) |

### What Gets Booted

The task starts a minimal subset of the supervision tree, matching the `db` operation mode:

- LevelDB (temporary directory, discarded after run)
- ETS caches (Blocks, BlockStates, CheckpointStates)
- StateTransition cache
- Task supervisors (for async state storage and pruning)
- Mocked Engine API (always returns `VALID` for execution payloads)

No networking, no Beacon API, no validator logic.

### Example Output

```
=== Block Processing Benchmark ===
Slots: 9649056 -> 9649120
Blocks: 61 / 64 (3 empty slots)
Epochs: 2 boundaries crossed

Total time: 18.7s
Avg per block: 306ms
Epoch blocks: [slot 9649088: 8.2s]
Non-epoch avg: 14ms
```

At `info` log level, each block also emits per-step timings from the state transition:

```
[on_block] slot=9649088 root=A1B2C3D4 epoch=true epoch.justification_and_finalization=1200ms epoch.rewards_and_penalties=3400ms ...
```

Use `--log-level warning` to suppress per-block logs and see only the summary.

## Typical Ranges for Benchmarking

| Goal | Suggested `--count` | Notes |
|------|-------------------|-------|
| Quick sanity check | 32 (1 epoch) | Fast, but only 1 epoch boundary |
| Standard benchmark | 64-128 (2-4 epochs) | Good balance of data and runtime |
| Full performance profile | 200+ (6+ epochs) | Multiple epoch boundaries, better averages |
| Epoch-only analysis | 32 | Start at slot N-1 of epoch boundary to isolate epoch cost |

## Reusing Downloaded Data

Downloaded datasets are self-contained (state + blocks + columns + metadata) and can be:
- Shared between team members (copy the directory)
- Rerun after code changes to compare before/after
- Stored long-term as regression baselines

The `bench/data/` directory is gitignored.
Loading
Loading