Skip to content

fix: sqlite-es snapshot sequence#18

Closed
0xgleb wants to merge 1 commit into
masterfrom
fix/sqlite-es-snapshot-sequence
Closed

fix: sqlite-es snapshot sequence#18
0xgleb wants to merge 1 commit into
masterfrom
fix/sqlite-es-snapshot-sequence

Conversation

@0xgleb

@0xgleb 0xgleb commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Motivation

sqlite-es's snapshot persistence conflates two different counters: persist
writes the third element of cqrs-es's snapshot tuple into the last_sequence
column, but that element is the snapshot version counter (next_snapshot
from PersistedEventStore::commit), not an event sequence. Reads compound the
bug: row_to_serialized_snapshot hardcodes current_snapshot: 0, so the
version counter also never advances past 1. Any consumer using
PersistedEventStore::new_snapshot_store with sqlite-es reloads a snapshot
whose state is at sequence N but whose current_sequence claims 1, then
re-applies events 2..N that are already folded into the snapshot — silent
double-application that permanently fails Lifecycle-wrapped aggregates on
the first post-snapshot load. The crate's own snapshot tests passed because
they wrote and read through the same misinterpretation.

Solution

  • persist derives last_sequence from events.last().sequence (mirroring
    event-sorcery's internal repository) and treats the tuple's third element as
    the snapshot version; an empty-events snapshot update is rejected with the
    new EmptySnapshotUpdate error.
  • update_snapshot/select_snapshot now write and read the
    snapshot_version column (already in the canonical migration — its
    DEFAULT 0 is what masked the omission), and
    row_to_serialized_snapshot populates SerializedSnapshot::current_snapshot
    from it.
  • Adds snapshot_round_trip_applies_events_exactly_once: a full
    CqrsFramework::execute round trip through a snapshot-backed store that
    rehydrates from a fresh store and asserts each event applied exactly once
    (it reproduced [a, b, b, c, b, c] before the fix).
  • Stacked on chore: bump up cqrs-es (0.5) #10. A sibling fix for the analogous prefix-snapshot hazard in
    event-sorcery's internal repository follows separately.

Closes RAI-921.

Summary by CodeRabbit

  • Bug Fixes

    • Snapshot updates now require accompanying events; updates without events are rejected.
    • Snapshot operations now use atomic transactions for consistency.
    • Snapshot versioning is now properly persisted and retrieved from the database.
    • Added protection against stale snapshot overwrites during concurrent writes.
  • Tests

    • Extended snapshot test coverage to verify event application accuracy and conflict handling.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

The PR enhances snapshot handling in the SQLite event sourcing layer by introducing transactional atomicity and monotonic sequence guards. Snapshot updates now use ON CONFLICT ... DO UPDATE with a guard to prevent overwriting newer snapshots, while the persist operation combines event insertion and snapshot updates within a single transaction, rejecting invalid snapshot updates without events.

Changes

Snapshot atomicity and sequence consistency

Layer / File(s) Summary
Snapshot query contracts with sequence guards
crates/sqlite-es/src/sql_query.rs
select_snapshot now includes snapshot_version column. update_snapshot changes from INSERT OR REPLACE to ON CONFLICT ... DO UPDATE SET with WHERE excluded.last_sequence > last_sequence guard to prevent overwriting newer snapshots. Tests assert the new query structure.
EmptySnapshotUpdate error and snapshot version deserialization
crates/sqlite-es/src/event_repository.rs
Introduces SqliteAggregateError::EmptySnapshotUpdate mapped to PersistenceError::UnknownError. row_to_serialized_snapshot reads snapshot_version from the database row and uses it for current_snapshot instead of hardcoding to 0. Test module imports updated.
Transactional persist with atomic event and snapshot updates
crates/sqlite-es/src/event_repository.rs
PersistedEventRepository::persist is rewritten to insert events and optionally update snapshots within a single SQLite transaction. Derives last_sequence from committed events and fails with EmptySnapshotUpdate when snapshot update is requested without events.
Transaction-aware test helpers for events and snapshots
crates/sqlite-es/src/event_repository.rs
insert_events and insert_events_within helpers provide transaction-aware event insertion. update_snapshot and update_snapshot_within refactor snapshot updates with monotonic last_sequence guards and proper version binding.
Snapshot round-trip and consistency test coverage
crates/sqlite-es/src/event_repository.rs
New AppendingAggregate test fixture validates exact-once event application across snapshot rebuilds. snapshot_round_trip_applies_events_exactly_once test validates rehydration. Additional tests cover optimistic-lock failures, corruption recovery, atomic rollback, and rejection of snapshot updates without events.

Suggested reviewers

  • JuaniRios
  • findolor
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: sqlite-es snapshot sequence' directly addresses the core issue: fixing incorrect snapshot sequence handling in sqlite-es, matching the PR's primary objective of correcting last_sequence derivation and snapshot version semantics.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/sqlite-es-snapshot-sequence

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

0xgleb commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label add-to-gt-merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has required the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@0xgleb 0xgleb self-assigned this Jun 9, 2026
@0xgleb 0xgleb added the bug Something isn't working label Jun 9, 2026 — with Graphite App
@0xgleb 0xgleb marked this pull request as ready for review June 9, 2026 23:25
@0xgleb 0xgleb requested review from JuaniRios and findolor June 9, 2026 23:25
@0xgleb 0xgleb force-pushed the fix/sqlite-es-snapshot-sequence branch from 42b178c to 5bc8232 Compare June 9, 2026 23:55
@0xgleb 0xgleb force-pushed the fix/sqlite-es-snapshot-sequence branch from 5bc8232 to 87d8fcd Compare June 10, 2026 02:50
@linear-code

linear-code Bot commented Jun 10, 2026

Copy link
Copy Markdown

RAI-921

@findolor findolor left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

@graphite-app graphite-app Bot changed the base branch from chore/bump-up-cqrs-es to graphite-base/18 June 10, 2026 17:30
@0xgleb 0xgleb changed the base branch from graphite-base/18 to master June 10, 2026 17:30
@0xgleb 0xgleb force-pushed the fix/sqlite-es-snapshot-sequence branch from 87d8fcd to 00d5969 Compare June 10, 2026 17:31
@graphite-app

graphite-app Bot commented Jun 10, 2026

Copy link
Copy Markdown

Merge activity

  • Jun 10, 5:31 PM UTC: Graphite rebased this pull request, because this pull request is set to merge when ready.
  • Jun 10, 7:11 PM UTC: 0xgleb added this pull request to the Graphite merge queue.
  • Jun 10, 7:11 PM UTC: CI is running for this pull request on a draft pull request (#22) due to your merge queue CI optimization settings.
  • Jun 10, 7:15 PM UTC: Merged by the Graphite merge queue via draft PR: #22.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/sqlite-es/src/event_repository.rs (1)

407-420: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Handle pre-fix snapshot rows during upgrade.

This fixes the read/write contract for new snapshots, but rows already written by the old code path still have last_sequence = next_snapshot and snapshot_version = 0 per the PR summary. Those rows will still deserialize with the wrong current_sequence after upgrade, so affected aggregates keep double-applying events until some later write happens to replace the snapshot. Please add an upgrade path here (for example, ignore or clear legacy snapshot_version = 0 rows so they rebuild from events, or ship a migration/backfill), and add a regression test that seeds one of those legacy rows rather than only round-tripping snapshots written by the fixed implementation.

Also applies to: 539-577


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6a93bc05-fe77-4734-94e2-f0de7bc9e99a

📥 Commits

Reviewing files that changed from the base of the PR and between a7e755d and 00d5969.

📒 Files selected for processing (2)
  • crates/sqlite-es/src/event_repository.rs
  • crates/sqlite-es/src/sql_query.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: hooks
  • GitHub Check: examples
  • GitHub Check: clippy
  • GitHub Check: fmt
  • GitHub Check: test
  • GitHub Check: clippy
  • GitHub Check: check
  • GitHub Check: hooks
  • GitHub Check: check
  • GitHub Check: test
  • GitHub Check: examples
  • GitHub Check: fmt
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

When handling clippy errors about function lengths or cognitive complexity, don't split up functions more than necessary. Instead ask the user if we can add a clippy allow.

Files:

  • crates/sqlite-es/src/sql_query.rs
  • crates/sqlite-es/src/event_repository.rs
crates/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

crates/**/*.rs: Package by Feature, Not by Layer. Organize by business domain, not technical layers. FORBIDDEN catch-all modules: types.rs, error.rs, models.rs, utils.rs, helpers.rs, http.rs, dto.rs, entities.rs, services.rs, domain.rs. CORRECT: lifecycle.rs, projection.rs, reactor.rs, view_backend.rs. Each feature module contains ALL related code.
CRITICAL: NEVER write directly to the events table — no direct INSERTs, no manual sequence numbers, no bypassing CqrsFramework. Always use CqrsFramework::execute() or execute_with_metadata() to emit events through aggregate commands.
CRITICAL: Single CQRS Framework Instance Per Aggregate. In any consuming application, each aggregate must have exactly ONE SqliteCqrs, constructed once at startup. Never call sqlite_cqrs() or CqrsFramework::new() per request. Direct construction is fine in test/CLI/migration code.
Use cqrs-es Services for side-effects in handle() to ensure atomicity with events. Naming: {Action}er trait -> {Domain}Service implements -> {Domain}Manager orchestrates.
Log in command handlers, not callers. All logging for command execution belongs in the aggregate's handle() method. The handler has full aggregate state making log messages rich without the caller needing extra context.
Make invalid states unrepresentable through the type system. Use ADTs and enums to encode business rules and state transitions directly in types rather than runtime validation.
Accept domain newtypes and convert to SDK primitives inside the callee. Exception: cross-crate boundaries where the callee can't depend on caller's domain types — destructure at the call site.
Favor FP/ADT patterns over OOP. Use pattern matching, combinators, type-driven design. Prefer iterators over imperative loops unless it increases complexity. Use itertools for richer iterator chains.
Follow comprehensive commenting guidelines. Comment complex business logic, algorithm rationale, external system behavior, non-obvious constraints, test data context, ...

Files:

  • crates/sqlite-es/src/sql_query.rs
  • crates/sqlite-es/src/event_repository.rs

@graphite-app graphite-app Bot closed this Jun 10, 2026
@graphite-app graphite-app Bot deleted the fix/sqlite-es-snapshot-sequence branch June 10, 2026 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants