feat(solana-indexer): PR 4 — Component structs (skeleton declarations)#4514
feat(solana-indexer): PR 4 — Component structs (skeleton declarations)#4514tilacog wants to merge 8 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the skeleton structure for the Solana settlement indexer components, including the Ingester, Decoder, PartialEventWatchdog, and FinalizationWorker, along with necessary dependency updates and shared type definitions. Feedback on the changes suggests replacing the global static LATEST_CHAIN_SLOT with a shared Arc passed via constructors to avoid process-wide shared mutable state and potential race conditions during parallel testing.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| /// The sole writer is the ingester, on every slot-filter message. Anchors the | ||
| /// partial-event watchdog and the finalization worker. Cold start is zero; the | ||
| /// watchdog skips its comparison on the first tick. | ||
| pub static LATEST_CHAIN_SLOT: AtomicU64 = AtomicU64::new(0); |
There was a problem hiding this comment.
Using a global static AtomicU64 for LATEST_CHAIN_SLOT introduces shared mutable state across the entire process. This causes race conditions and flakiness when running unit/integration tests in parallel (Cargo's default behavior). It also prevents running multiple indexer instances in the same process.
Actionable Suggestion:
Remove the global static and instead pass an Arc<AtomicU64> (or a shared state struct) to the constructors of Ingester, Decoder, PartialEventWatchdog, and FinalizationWorker.
References
- Focus exclusively on identifying missing edge cases, potential race conditions, or logic that deviates from the PR's stated goals. (link)
|
This pull request has been marked as stale because it has been inactive a while. Please update this pull request or it will be automatically closed. |
| /// Typical number of slots for a transaction to finalize (~12.8 s). The | ||
| /// promotion pass skips rows fresher than this. |
There was a problem hiding this comment.
What does typical mean here? In which circumstances are 32 slots NOT correct?
There was a problem hiding this comment.
Tried to clarify the doc.
|
|
||
| /// Typical number of slots for a transaction to finalize (~12.8 s). The | ||
| /// promotion pass skips rows fresher than this. | ||
| #[allow(dead_code)] |
There was a problem hiding this comment.
Instead of using #[allow(lint)] please use #[expect(lint)]. The reason is that clippy will automatically generate a warning when the lint is no longer violated which forces you to clean up stale lint exceptions.
|
|
||
| /// Transaction finalization worker. See the module docs for the two flows it | ||
| /// runs. | ||
| pub struct FinalizationWorker<St: Store, R: SolanaClient> { |
There was a problem hiding this comment.
IMO going down the route of generics instead of Arc<dyn Trait> is a slippery slope. Type parameters need to be propagated throughout the code base and the tiny overhead of chasing a pointer when using dyn trait hardly matters if the component issues network requests.
There was a problem hiding this comment.
Done. Switched Store and SolanaClient to #[async_trait] (in #4508), and all the components now hold Arc<dyn Store> / Arc<dyn SolanaClient> instead of generic params. Also dropped the #![allow(async_fn_in_trait)] workaround. Lines up with the rest of the repo.
| /// The sole writer is the ingester, on every slot-filter message. Anchors the | ||
| /// partial-event watchdog and the finalization worker. Cold start is zero; the | ||
| /// watchdog skips its comparison on the first tick. | ||
| pub static LATEST_CHAIN_SLOT: AtomicU64 = AtomicU64::new(0); |
There was a problem hiding this comment.
What's the reason that this value does not live inside a specific component that gets accessed via a getter?
Also the API is currently fragile. The type is pub and allows anyone to access it. A safer API would be a new type wrapping the counter with an update function that is only accessible in this module for example.
There was a problem hiding this comment.
Half of this lands in #4549, where the global static becomes an Arc<AtomicU64> owned by the Ingester, so it's no longer a free-floating global. The newtype-with-restricted-write part isn't done yet though, it's still a raw Arc<AtomicU64> any holder can write to. I'll wrap it in a small ChainTip type in #4549 then.
| /// Capacity of the channel from the ingester to the decoder. | ||
| pub const INGEST_TO_DECODER_CAPACITY: usize = 1024; |
There was a problem hiding this comment.
This and a few other consts seems like it should be configurable by a config file.
| /// Capacity of the channel from the ingester to the decoder. | ||
| pub const INGEST_TO_DECODER_CAPACITY: usize = 1024; |
There was a problem hiding this comment.
Can you explain why the ingester is necessary? Why is it better to have it forward the events to a different channel? If the other channel it pushed into does not get cleared fast enough the backpressure will end up here anyway.
There was a problem hiding this comment.
UUIC, the ingester exists to keep reading from the stream without pausing since yellowstone disconnects slow readers, and the slowest part of our work is writing to the DB. If one loop reads a message and then writes it to the DB, it stops reading while it writes, messages pile up, and the server drops us. A bigger buffer in the client shouldn't help, because a buffer still needs something to keep emptying it, and that same loop is busy writing.
You're right that a long overload still backs up to the ingester and drops us anyway. The buffer only smooths over short slow spikes, like a burst in one slot or one slow write. So the split isn't about speed or handling more load. It only keeps us from getting disconnected during short slow moments.
@tilacog please correct me if I am wrong.
| /// Sends `StreamUpdate` to the decoder. Should be bounded to | ||
| /// `RECONNECT_BACKOFF_CAP` entries. | ||
| pub tx: Sender<StreamUpdate>, |
There was a problem hiding this comment.
Comment says the size of the channel should be bounded to a unit of time which does not seem to make sense.
There was a problem hiding this comment.
Fixed, the doc now references INGEST_TO_DECODER_CAPACITY (the channel capacity), not the backoff duration.
| /// Shared in-memory map of partial events keyed by `PartialEventKey`. | ||
| /// | ||
| /// The decoder holds a clone of this `Arc` and both inserts and removes | ||
| /// halves as it processes them. | ||
| pub partials: Arc<DashMap<PartialEventKey, PartialEvent>>, |
There was a problem hiding this comment.
What was the reasoning for using channels for some of the communication but not for this partials map?
There was a problem hiding this comment.
Both the decoder and the watchdog need to look entries up and remove them by key, which a stream can't do. The decoder removes the matching half by (slot, signature) once the second one arrives, and the watchdog scans for the ones that waited too long and drops those. So I guess this map is reasonable here.
cc @tilacog
| /// The sole writer is the ingester, on every slot-filter message. Anchors the | ||
| /// partial-event watchdog and the finalization worker. Cold start is zero; the | ||
| /// watchdog skips its comparison on the first tick. | ||
| pub static LATEST_CHAIN_SLOT: AtomicU64 = AtomicU64::new(0); |
There was a problem hiding this comment.
It should be documented that this slot number is completely unrelated to the indexing progress of the system. Obviously depends on the rest of the indexer logic but it sounds like a counter that tracks fully processed slots might be more useful.
There was a problem hiding this comment.
Documented it. The doc now says LATEST_CHAIN_SLOT is the chain tip, not indexing progress. The processed-slots counter you're describing already exists separately, it's the watermark in solana.indexer_state written by the decoder.
…tstrap Resolve conflicts from the updated PR3 base: - keep dashmap (PR4) plus bytes/derive_more (PR3), drop observe/prometheus (metrics.rs gone) - shared.rs: keep PR4's PartialEventKey/PartialEvent rework, adopt PR3's Slot and pub(crate) - types/mod.rs: keep both the shared (PR4) and slot (PR3) modules - lower the four indexer structs to pub(crate) to match PR3's pub(crate) types, and add expect(dead_code)/expect(unused_imports) for the not-yet-wired scaffolding
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
Warning Review the following alerts detected in dependencies. According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.
|
- PartialEventKey uses Slot instead of u64, matching StreamUpdate - finalization: clarify FINALIZATION_WINDOW_SLOTS doc, drop the per-const #[allow(dead_code)] in favor of the module-level #![expect(dead_code)]
…to solana-indexer/PR4-bootstrap
Replace the generic type params on the four components with Arc<dyn Store> / Arc<dyn SolanaClient> fields, per review feedback. Avoids propagating type params through the codebase, and dyn dispatch is negligible for these I/O-bound components. Ingester keeps its GrpcConnector generic for stream mocking.
- note LATEST_CHAIN_SLOT is the chain tip, not indexing progress (the watermark in solana.indexer_state is the separate processed-slot counter) - fix the decoder channel doc to reference INGEST_TO_DECODER_CAPACITY, not the unrelated RECONNECT_BACKOFF_CAP duration
Description
Adds the
indexer/module with skeleton declarations for the four components that will do the actual work of the indexer. Each struct declares its fields, a constructor, and doc comments describing what it will do; therunbodies areunimplemented!and the behavior lands in later PRs.The four components and their roles:
confirmedcommitment level. This worker re-checks them against the chain and promotes them tofinalized, or marks them rolled back if the transaction disappeared. It uses a cheap batched RPC call for recent rows and falls back to one-call-per-row lookups for rows old enough that the batched method no longer reports them.The field declarations also pin down how the components talk to each other: the ingester feeds the decoder over a bounded channel, while the decoder and the watchdog share a concurrent map with no message passing between them. That second decision reshaped the partial-event types, so
types/channel.rsbecametypes/shared.rs.Changes
indexer/ingester.rs: theIngesterstruct. Generic over the connection so unit tests can drive it with a mock (the trait-bound approach chosen in PR 3 instead of a third in-crate trait). Also declares the shared latest-chain-slot counter and the constants for reconnect backoff and channel capacity.indexer/decoder.rs: theDecoderstruct, holding the receiving end of the channel, the shared partial-event map, and the two program ids it filters for.indexer/watchdog.rs: thePartialEventWatchdogstruct, holding the store and its view of the shared partial-event map.indexer/finalization.rs: theFinalizationWorkerstruct, with module docs explaining the two promotion flows and the constants that bound them (finalization window, batch size, retention horizon).types/channel.rstotypes/shared.rsand reworked the partial-event types: instead of payloads sent over a channel, a partial event is now a key (slot + signature) and a value (whichever half arrived first) in the map shared by the decoder and the watchdog.yellowstone-grpc-clientto the workspace (source of the connection trait the ingester is generic over), plusdashmapand a few tokio features to the crate.How to test
cargo check -p solana-indexercargo clippy --locked -p solana-indexer --all-features --all-targets -- -D warningsNo unit tests are included: the structs declare shape only, and every
runbody isunimplemented!until the behavior PRs.This is a follow-up PR to #4508