Skip to content

refactor: reuse sql DB on GTFS reload#875

Open
fletcherw wants to merge 1 commit intoOneBusAway:mainfrom
fletcherw:gtfs_merge
Open

refactor: reuse sql DB on GTFS reload#875
fletcherw wants to merge 1 commit intoOneBusAway:mainfrom
fletcherw:gtfs_merge

Conversation

@fletcherw
Copy link
Copy Markdown
Collaborator

Restructure GTFS static data loading/reloading into one shared function. Now, when periodically reloading GTFS static data, we update the existing database in place rather than opening a new database.

This has the additional benefit that we no longer need a mutex on GtfsDB, since it is not modified after the Manager is constructed. Because of that, this change also deletes all of the external usages of Manager.staticMutex, and converts it to an implementation detail of Manager (still used to protect a few other fields).

Restructure GTFS static data loading/reloading into one shared function.
Now, when periodically reloading GTFS static data, we update the
existing database in place rather than opening a new database.

This has the additional benefit that we no longer need a mutex on
GtfsDB, since it is not modified after the Manager is constructed.
Because of that, this change also deletes all of the external usages of
Manager.staticMutex, and converts it to an implementation detail of
Manager (still used to protect a few other fields).
Copy link
Copy Markdown
Member

@Ahmedhossamdev Ahmedhossamdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terrific work @fletcherw
LGTM!

I have only one question:
SQLite will not block any reads when we update in place, right?

Copy link
Copy Markdown
Member

@aaronbrethorst aaronbrethorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Fletcher,

Thanks for this one — it's exactly the kind of cleanup this codebase has been begging for. Collapsing the hot-swap dance (temp DB, file rename, recovery paths) into a single in-place import, and turning staticMutex from a handler-visible contract into a private implementation detail, is a big step forward in readability. Net −517 lines with broader handler simplification is the cherry on top.

I do have a couple of concerns I'd like resolved before this lands, plus some smaller fit-and-finish notes.

Critical Issues (0 found)

None.

Important Issues (2 found)

1. Post-commit work isn't retried, leaving the DB permanently missing min_arrival_time / max_departure_time

In importStaticIntoDB (internal/gtfs/static.go:97-128), the flow is:

  1. ImportFromFile / DownloadAndStoreprocessAndStoreGTFSDataWithSource, which commits a transaction that writes all rows and upserts import metadata (gtfsdb/helpers.go:470).
  2. After the commit succeeds, BulkUpdateTripTimeBounds runs as a separate write.
  3. Then PrecomputeAllDirections runs as another separate write.

Failure scenario: the main tx commits, then BulkUpdateTripTimeBounds errors (context cancel on shutdown, transient DB error, etc.). importStaticIntoDB returns (true, err); ReloadStatic propagates the error. On the next reload:

  • processAndStoreGTFSDataWithSource checks GetImportMetadata and finds the committed hash matches the source bytes → returns (false, nil).
  • importStaticIntoDB short-circuits on !changed and skips BulkUpdateTripTimeBounds entirely.

The DB is now stuck in a state where trips exist but min_arrival_time / max_departure_time are NULL forever, silently breaking queries like GetTripsInBlockWithTimeBounds and GetActiveTripsWithNullBlockForRoute (both filter on min_arrival_time <= / max_departure_time >=, which evaluates to false for NULL).

Fix: The underlying query BulkUpdateTripTimeBounds is already guarded by WHERE min_arrival_time IS NULL OR max_departure_time IS NULL, so it's cheap and idempotent. The simplest fix is to run it unconditionally in importStaticIntoDB, not only when changed == true. (Alternatively, move it inside the main import transaction — see issue #2.)

2. Reload is no longer atomic from a reader's perspective

The old ForceUpdate built a temp DB, precomputed everything, and then did an os.Rename to swap the file atomically. Readers saw either fully-old or fully-new state.

The new path commits the main import tx first, then runs BulkUpdateTripTimeBounds and PrecomputeAllDirections as separate writes. During the window between tx-commit and those post-commit steps, readers observe:

  • New trips/routes/stops are already visible (committed).
  • But trips.min_arrival_time / max_departure_time are still NULL until the bulk update runs.
  • stops.direction is NULL until PrecomputeAllDirections finishes — and that one can take multiple seconds on larger agencies.

Queries that filter on these columns (e.g. block/null-block active-trip selection) will return empty or incorrect results during this window. It's not a total outage, but it's a regression in consistency that the old hot-swap gave us for free.

Fix: Move BulkUpdateTripTimeBounds inside the main import transaction in processAndStoreGTFSDataWithSource — it's a single UPDATE and will add negligible time to a transaction that already writes every row. PrecomputeAllDirections is harder (it's multi-step and uses its own transactions), but the on-demand fallback in AdvancedDirectionCalculator at least covers that gap for the direction API. Worth a line in the PR description acknowledging the trade-off either way.

Suggestions (3 found)

3. context.TODO() in the filter functions (internal/gtfs/realtime.go:516, 535, 553)

These three filter functions previously used context.Background(); they now use context.TODO(). Functionally identical, but TODO is conventionally a "I haven't decided yet" marker, and some linters flag it. Either revert to Background or — ideally — plumb a real context down from the callers (updateFeedRealtime has one available).

4. Redundant fetch-and-parse on every reload

ReloadStatic calls loadGTFSData (which fetches + parses), then importStaticIntoDBImportFromFile / DownloadAndStore, which re-reads the same file or re-downloads the same URL and parses again when the hash doesn't match. This double-fetch existed in the old code too, so not a regression, but since you're in here simplifying this code path, it's a reasonable time to thread the already-fetched bytes through instead of refetching. Not a blocker — note it or file a follow-up issue if you'd rather keep this PR focused.

5. IsHealthyIsReady is a visible contract change

currentTimeHandler used to 503 when isHealthy was false, which happened specifically on a failed DB-swap recovery. The new code uses IsReady, which is just "startup completed" and never flips back. If any monitoring/alerting keyed off the old health semantic, they'll lose that signal. In practice the new behavior is probably fine (the DB stays valid because tx rollback preserves state), but worth a heads-up in the PR body.

Strengths

  • Huge simplification. The temp-DB + rename + recovery dance was a nest of edge cases. Replacing it with a single in-place import is a clear win.
  • Mutex surface shrunk correctly. Getting rid of public RLock / RUnlock on Manager, the queriesMu on AdvancedDirectionCalculator, and all 40+ handler-side lock acquisitions is exactly the right direction.
  • Graceful startup fallback — if reload fails but metadata indicates prior data was imported, the manager continues serving the pre-existing DB. Nice touch.
  • Tests updated thoroughly and they pass (make test clean); lint errors are all pre-existing in gtfsdb/stops_rtree.go, unrelated to this PR.

Recommended Action

  1. Fix issue #1 by running BulkUpdateTripTimeBounds unconditionally (or moving it into the import tx).
  2. Decide on issue #2 — ideally move the time-bounds update into the tx; if you'd rather accept the reader inconsistency window, document it in the PR description.
  3. Address suggestions #3#5 as you see fit.
  4. Ping me when ready and I'll re-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants