feat: add hermetic backend e2e harness by Git-on-my-level · Pull Request #8004 · BasedHardware/omi

Git-on-my-level · 2026-06-18T00:40:15Z

Summary

Adds a hermetic backend e2e harness under backend/testing/e2e/ and wires it into GitHub Actions as a required/blocking backend PR check.

The harness imports the real FastAPI backend and exercises selected route/seam coverage against fake/disabled external-service boundaries. It is intended to catch backend boot/route regressions and accidental real-network/service leakage without requiring Firestore, Redis, GCS, Pinecone, Typesense, LLM/STT provider keys, or Google ADC.

What this covers

Real FastAPI app import and TestClient execution
Firestore replacement using fake-firestore helpers
Redis replacement using fakeredis[lua]
Filesystem-backed fake GCS storage client
Deterministic fixture seeding for conversations, memories, and action items
CRUD/data-shape coverage for action items, memories, and seeded conversations
Legacy shape compatibility coverage for representative conversation/memory shapes
Conversation-processing seam coverage with deterministic provider-heavy function replacement
Listen/custom-STT route seam coverage
Webhook/task-integration failure and disable-path coverage
Retrieval/search route seam coverage with deterministic in-memory vector/search fake
Mobile-facing lifecycle/client response-shape compatibility coverage
Local-only network guard during pytest execution:
- blocks non-local DNS lookup
- blocks non-local socket connect/connect_ex/create_connection/sendto
External credential hardening:
- overwrites API-key env vars with fake values
- clears SERVICE_ACCOUNT_JSON, GOOGLE_APPLICATION_CREDENTIALS, and PINECONE_API_KEY
Process-level timeout via E2E_PYTEST_TIMEOUT so websocket/provider-seam regressions fail instead of hanging.

CI

New workflow:

Backend Hermetic E2E

It runs on backend PR changes and is now blocking: failures/timeouts fail PR status.

Verification

Latest local full harness:

bash backend/testing/e2e/run.sh -q --tb=short

Result:

72 passed, 6 skipped, 43 warnings
✅ All e2e tests passed!

Focused changed-area run:

35 passed, 3 skipped, 40 deselected, 25 warnings

Earlier local flake loop:

20 consecutive full harness runs
20 passed / 0 failed
observed failure rate: 0.0%

Explicit v1 limitations

The harness intentionally keeps incomplete provider flows as explicit skips/deferred v2 work rather than silent passes:

Full Deepgram-compatible streaming WebSocket fake for /v4/listen / pusher remains v2.
Full deterministic LLM endpoint wiring into every OpenAI/Anthropic/OpenRouter client remains v2.
Real Pinecone/Typesense service compatibility is out of scope; retrieval/search uses deterministic in-memory seams.
Real Redis-unavailable fail-open tests remain v2.
Marketplace app webhook retry/circuit-breaker coverage remains v2.

Framing

This PR adds a strong Layer 1 hermetic route/seam e2e harness. It is not full production-provider integration coverage.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 713a9e1764

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T00:45:28Z

+}
+
+echo "Checking/installing fake dependencies..."
+install_if_missing fake_firestore fake-firestore


Track e2e dependencies instead of pip-installing them

In a clean checkout or any CI/offline developer environment where PyPI is blocked, bash backend/testing/e2e/run.sh exits before pytest because these required packages are not declared in backend/requirements.txt or another tracked lockfile and the script tries to fetch them dynamically. I checked repo-wide references with rg; the new harness is the only place these packages are named, so the advertised one-command harness is not reproducible without network access.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T00:45:28Z

+        original_init(self, *args, **kwargs)
+        fake = get_fake_redis()
+        # Delegate all key methods to the fake
+        for attr in (


Delegate Redis scripts to fakeredis

Rate-limited endpoints in these e2e tests do not actually exercise fakeredis because database/redis_db.py creates _RATE_LIMIT_LUA = r.register_script(...) at import time, but this delegation list never forwards register_script/evalsha/script_load to the fake client. In contexts like /v3/memories, the script remains bound to the real redis.Redis client and failures are swallowed by the fail-open path, so the harness can pass while Redis behavior is not being tested.

Useful? React with 👍 / 👎.

greptile-apps · 2026-06-18T00:45:57Z

Greptile Summary

Adds a hermetic backend e2e test harness under backend/testing/e2e/ that imports the real FastAPI app, replaces Firestore/Redis with in-memory fakes, overwrites all external API keys, and installs a socket guard to block non-local network calls — enabling CRUD, failure-mode, and migration-safety coverage without any live services.

conftest.py wires the entire fake layer: Google auth patch → env hardening → socket guard → MockFirestore/FakeRedis patches → TestClient factory. The sendto guard only handles the 2-arg call form, leaving the 3-arg (data, flags, address) variant mismatched. The session-scoped Firestore store with hardcoded UID \"123\" accumulates state across all tests.
fakes/firestore.py delegates collection, batch, and get_all to the mock but never patches document(), leaving root-level document access on the uninstrumented client.
fakes/redis.py bridges only scalar operations; hash/list/sorted-set families are not delegated to FakeRedis.

Confidence Score: 4/5

Safe to merge as a manually-run local harness; the defects are in the test infrastructure itself and do not touch production code.

The network guard's sendto implementation has a wrong signature for the 3-arg call form — instead of raising a clear blocked-network-call assertion it would raise a TypeError, crashing the harness unexpectedly. The remaining issues (missing document() delegation, incomplete Redis method bridging, shared Firestore state) are all contained within the test layer and are non-blocking for the stated v1 scope, but they could cause confusing failures as more test cases are added.

conftest.py (socket guard sendto signature), fakes/firestore.py (missing document() delegation), and fakes/redis.py (incomplete method bridging) are the three files most likely to cause unexpected failures as the harness grows.

Important Files Changed

Filename	Overview
backend/testing/e2e/conftest.py	Core harness setup: env hardening, Google auth patch, socket guard, TestClient factory. The `sendto` guard has a wrong signature for the 3-arg form, and session-scoped Firestore with hardcoded UID "123" causes test state leakage.
backend/testing/e2e/fakes/firestore.py	MockFirestore setup and seed/read helpers. `patch_google_firestore` delegates `collection`, `batch`, and `get_all` but never patches `self.document`, leaving root-level document access unguarded.
backend/testing/e2e/fakes/redis.py	FakeRedis setup and Redis.Redis monkeypatching. Bridges only scalar operations; hash, list, and sorted-set families are not delegated, so those backend calls fall through to the real unconnected client.
backend/testing/e2e/fakes/llm.py	Deterministic LLM response builders and pytest-httpserver configurators. `make_openai_router_response` is defined after its caller `configure_llm_fakes`; works at runtime but is confusing.
backend/testing/e2e/fakes/embeddings.py	Fake embedding/diarization/VAD HTTP handlers using pytest-httpserver. Clean scaffold; unused until LLM processing tests are wired.
backend/testing/e2e/fakes/stt.py	STT fake scaffold. Deepgram WS fake is explicitly deferred (TODO). HTTP pre-recorded endpoint helper present; `configure_stt_timeout` is intentionally a no-op.
backend/testing/e2e/fakes/storage.py	Temp-dir GCS replacement with upload/download/delete/list helpers. Clean and self-contained.
backend/testing/e2e/run.sh	One-command entry point: venv detection, fake-dep bootstrapping, pytest invocation, exit-code propagation. Straightforward and correct.
backend/testing/e2e/test_crud.py	CRUD golden-path tests for conversations, action items, and memories. Solid route coverage; `test_list_action_items` swallows create-call failures silently.
backend/testing/e2e/test_conversation_processing.py	Conversation processing and state-transition tests. LLM-backed paths are appropriately skipped; seed-and-read and discarded-filter tests are active.
backend/testing/e2e/test_failure_modes.py	Failure-mode and edge-case tests. Auth/404/unicode/long-desc edge cases are active; LLM 500 and STT timeout tests are explicitly skipped with clear TODOs.
backend/testing/e2e/test_migration_safety.py	Legacy-format reading, idempotency, field-shape evolution, and category-enum migration tests. Well-structured with clear intent for each scenario.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Shell as run.sh
    participant Pytest as pytest
    participant Conf as conftest.py
    participant Fakes as fakes/*
    participant App as FastAPI app (main.py)
    participant DB as MockFirestore / FakeRedis

    Shell->>Pytest: python -m pytest testing/e2e/
    Pytest->>Conf: import conftest.py
    Conf->>Conf: patch google.auth.default
    Conf->>Conf: _set_e2e_env() — overwrite credentials
    Conf->>Conf: install socket guard (connect/getaddrinfo)
    Conf->>Fakes: setup_fake_firestore() → MockFirestore
    Conf->>Fakes: setup_fake_redis() → FakeRedis
    Conf->>Fakes: patch_google_firestore() — delegate collection/batch/get_all
    Conf->>Fakes: patch_redis_client() — delegate scalar ops
    Conf->>App: import main → FastAPI app constructed
    App-->>Conf: app cached in _app_cache

    loop each test
        Pytest->>Conf: client fixture → TestClient(app)
        Pytest->>Fakes: seed_conversation / seed_memory / seed_action_item
        Fakes->>DB: MockFirestore.collection().document().set()
        Pytest->>App: TestClient.get / post / patch / delete
        App->>DB: Firestore query / Redis op (via patched clients)
        DB-->>App: in-memory result
        App-->>Pytest: HTTP response
        Pytest->>Pytest: assert status / body
    end

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Shell as run.sh
    participant Pytest as pytest
    participant Conf as conftest.py
    participant Fakes as fakes/*
    participant App as FastAPI app (main.py)
    participant DB as MockFirestore / FakeRedis

    Shell->>Pytest: python -m pytest testing/e2e/
    Pytest->>Conf: import conftest.py
    Conf->>Conf: patch google.auth.default
    Conf->>Conf: _set_e2e_env() — overwrite credentials
    Conf->>Conf: install socket guard (connect/getaddrinfo)
    Conf->>Fakes: setup_fake_firestore() → MockFirestore
    Conf->>Fakes: setup_fake_redis() → FakeRedis
    Conf->>Fakes: patch_google_firestore() — delegate collection/batch/get_all
    Conf->>Fakes: patch_redis_client() — delegate scalar ops
    Conf->>App: import main → FastAPI app constructed
    App-->>Conf: app cached in _app_cache

    loop each test
        Pytest->>Conf: client fixture → TestClient(app)
        Pytest->>Fakes: seed_conversation / seed_memory / seed_action_item
        Fakes->>DB: MockFirestore.collection().document().set()
        Pytest->>App: TestClient.get / post / patch / delete
        App->>DB: Firestore query / Redis op (via patched clients)
        DB-->>App: in-memory result
        App-->>Pytest: HTTP response
        Pytest->>Pytest: assert status / body
    end

Comments Outside Diff (3)

backend/testing/e2e/fakes/firestore.py, line 663-681 (link)

document() method not delegated to mock

patch_google_firestore delegates collection, batch, and get_all to the mock instance and stores self._mock = mock, but never patches self.document. Any backend code that calls db.document("path") to obtain a root-level document reference will hit the real firestore.Client.document() — whose internal state (_database, etc.) was never properly initialized — and raise an AttributeError. The stored self._mock reference has no effect since self.document still points to the uninstrumented class method.
backend/testing/e2e/fakes/redis.py, line 953-974 (link)

Hash, sorted-set, and list operations not delegated

patch_redis_client bridges a fixed allowlist of scalar operations but omits the hash (hget, hset, hmset, hgetall, hdel), list (lpush, lrange, rpop), and sorted-set (zadd, zrange, zrangebyscore) families. Any backend code that uses these operations will call them on the real, unconnected redis.Redis instance rather than the FakeRedis instance. The backend may already handle these failures gracefully (fail-open), but the gap means that data written via the Redis module is invisible to the fake store and tests that rely on Redis-backed state for these operations will silently receive wrong results rather than surfacing an error.
backend/testing/e2e/fakes/llm.py, line 857-862 (link)

make_openai_router_response called before its definition

configure_llm_fakes calls make_openai_router_response() before that helper is defined in the file. Python resolves names at call time so this works at runtime, but the out-of-order definition is surprising when reading top-to-bottom. The helper should be defined above the function that uses it, or the call can be replaced with the make_openai_chat_response it delegates to.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (1): Last reviewed commit: "feat: add hermetic backend e2e harness" | Re-trigger Greptile}

greptile-apps · 2026-06-18T00:46:01Z

+    """Return the fixed dev-test UID."""
+    return DEV_UID
+


sendto guard only handles the 2-arg form

socket.socket.sendto has two call forms: sendto(data, address) and sendto(data, flags, address). The patched signature _guarded_socket_sendto(self, data, address) only covers the first. If any library in the test process calls the 3-arg form (e.g. sock.sendto(b"payload", 0, ("host", 80))), Python invokes _guarded_socket_sendto(sock, b"payload", 0, ("host", 80)) — four positional arguments against a three-parameter function — raising an unhandled TypeError instead of the expected AssertionError. The guard would crash the test with an opaque error rather than surfacing a blocked network call.

greptile-apps · 2026-06-18T00:46:04Z

+
+
+# ─── Utility fixtures ──────────────────────────────────────────────────
+
+
+@pytest.fixture()
+def fresh_uid():
+    """Generate a unique UID per test for isolation."""
+    import uuid
+
+    return str(uuid.uuid4())
+
+
+def _now_iso() -> str:
+    return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
+
+
+@pytest.fixture()


Session-scoped Firestore store + hardcoded UID "123" accumulates state across tests

fake_firestore is session-scoped, so the same in-memory store is shared by every test in the session. Multiple test classes (TestConversationCRUD, TestConversationProcessing, TestFailureModes, TestMigrationSafety) each seed_conversation("123", ...) with fixed document IDs such as "fail-persist-001", "discard-test-active", "migration-idempotent-001", etc. Tests that assert on list responses (e.g., test_discarded_conversation_filtered, test_mixed_format_coexistence) see all documents seeded by previously-run tests, making the assertions dependent on execution order. A test that passes in isolation can fail when run after another test that left conflicting data in the shared store.

Git-on-my-level · 2026-06-18T01:01:31Z

Follow-up pushed in 97186a655 after reviewing Codex/Greptile feedback and a 3-agent harness committee pass.

Changes made:

Removed dynamic pip install from run.sh; added pinned backend/testing/e2e/requirements.txt.
Disabled dotenv reloading and cleared proxy env vars so local .env / proxy config cannot rehydrate real service credentials.
Added harness guard tests for dotenv disabling and non-local DNS/TCP/UDP blocking, including both sendto(data, address) and sendto(data, flags, address) forms.
Added per-test Firestore/Redis fake-state isolation around the fixed local-dev UID.
Broadened Firestore delegation to the fake client surface, including document() / collection_group() from MockFirestore.
Broadened Redis delegation to fakeredis, including constructor/from_url paths and Redis script/hash/list/zset families exposed by fakeredis.
Tightened previously permissive assertions in CRUD/edge tests to assert create statuses, captured IDs, durable postconditions, and exact current behavior.
Reworded docs/tests from “migration safety” to “legacy shape compatibility” where no production migration script is executed.
Kept LLM/STT/failure-provider gaps explicit as skipped/deferred v2 work rather than counting them as green coverage.

Dogfood after the follow-up:

bash backend/testing/e2e/run.sh -q --tb=short
42 passed, 6 skipped, 35 warnings
✅ All e2e tests passed!

GitHub Lint & Format Check passed on the new commit.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 97186a6553

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T01:09:59Z

+
+def _guarded_getaddrinfo(host, port, *args, **kwargs):
+    if host is not None and host not in _ALLOWED_NETWORK_HOSTS:
+        raise AssertionError(f"Hermetic e2e blocked DNS lookup for {host!r}")


Make network guard failures uncatchable

When a real outbound call happens inside backend code that intentionally catches Exception (for example Stripe startup validation or vector/persona fallback paths), this AssertionError (and the connect guard above) can be swallowed; with the client fixture also disabling logging, the harness still passes even though a real service call was attempted. To preserve the hermetic guarantee in those contexts, raise a sentinel that broad except Exception handlers do not catch or record the attempt and fail during teardown.

Useful? React with 👍 / 👎.

Git-on-my-level · 2026-06-18T01:31:00Z

Expanded the hermetic harness across the five requested areas in fabdc5a0f.

What changed:

Deterministic conversation processing seam
- Added a reprocess-route test that fakes only the provider-heavy processing function while still exercising real auth, route validation, model serialization, Firestore update, persisted readback, and action-item queryability.
Listen/STT route seam
- Added /v4/web/listen websocket coverage for first-message auth, query parsing, and custom-STT dispatch using a fake stream handler. This is intentionally route/seam coverage, not a full Deepgram protocol fake.
Storage/GCS fake
- Replaced helper-only storage fake with a filesystem-backed fake google.cloud.storage.Client wired before backend import.
- Added speech-profile route tests for profile existence, signed URL generation, additional sample listing, and sample deletion.
- Added a harness guard asserting the backend imported utils.other.storage.storage_client is the fake client.
Webhooks and task integrations
- Added developer webhook config/status roundtrip and realtime webhook payload capture with httpx.MockTransport.
- Added task-integration save/list/default coverage and Todoist task-creation outbound payload capture.
- Kept the fake-firestore nested task-integration single-doc/delete limitation documented instead of overclaiming delete coverage.
User/auth/profile/account
- Added route coverage for auth guard, profile 410/read, onboarding, language/transcription prefs, people CRUD, notification settings, assistant settings merge, AI profile merge, and BYOK activation/deactivation.

Other harness improvements:

Broadened per-test isolation to user doc/subcollections and fake GCS state.
Fixed run.sh argument forwarding so -k/specific selections work correctly.
Normalized seeded action-item timestamps to match backend sorting expectations.
README now reflects partial vs full coverage honestly.

Dogfood:

bash backend/testing/e2e/run.sh -q --tb=short
54 passed, 6 skipped, 35 warnings
✅ All e2e tests passed!

Also verified filtered runner behavior:

bash backend/testing/e2e/run.sh -q --tb=short -k listen_stt
1 passed, 58 deselected

Independent review of the staged diff passed with no blockers.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fabdc5a0f0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T01:38:46Z

+    tc = TestClient(app)
+    yield tc


Run the FastAPI lifespan in the e2e client

In this fixture the TestClient is yielded without entering it as a context manager. FastAPI/Starlette only drive lifespan handlers when the client is used as a context manager, so the backend startup/shutdown handlers in backend/main.py (startup_event/shutdown_event) are skipped for every e2e request. That means regressions in startup tasks or shutdown cleanup can pass this harness; wrap the yield in with TestClient(app) as tc: so these tests exercise the real app lifecycle.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T01:38:46Z

+    def download_as_bytes(self, *args, **kwargs) -> bytes:
+        if not self.exists():
+            raise FileNotFoundError(self.name)


Raise GCS NotFound from missing fake blobs

When a backend path tests a missing GCS object through this fake, it will see FileNotFoundError instead of google.cloud.exceptions.NotFound/BlobNotFound. The real storage helpers catch the Google exception in paths like download_syncing_temporal_file, download_playback_artifact, and delete_blob, so missing-object e2e scenarios either fail/500 under the fake or have to avoid those branches even though production handles them. Raise the same NotFound class here.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2181d5aad6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T03:30:48Z

@@ -0,0 +1,171 @@
+# Hermetic Backend E2E Harness
+
+A manually runnable integration test suite that imports the **real omi FastAPI backend** and exercises selected routes against **faked or disabled external dependencies**. It is intended as a local dogfood harness first; there is no CI wiring yet.


Wire the new e2e suite into CI

This adds a large set of backend test files while explicitly leaving them with “no CI wiring yet”, but backend/AGENTS.md requires new test files to be added to backend/test.sh; I also checked backend/test.sh and workflow YAML with rg and found no testing/e2e invocation. As a result these 50+ hermetic scenarios can silently rot and won’t catch regressions unless a developer remembers to run the separate script manually.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d726e5bbd1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T04:44:12Z

+
+def _receive_until(websocket, predicate, *, limit=20):
+    for _ in range(limit):
+        message = websocket.receive()


Bound websocket receives with a timeout

If the listen websocket regresses by keeping the connection open but not emitting the expected service_status or segment payload, this blocking receive() never reaches the limit check, so bash backend/testing/e2e/run.sh -k listen_stt can hang instead of failing. Add a per-receive timeout (or run the receive under a bounded wait) so missing websocket events produce a deterministic test failure.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T04:44:12Z

+        self.metadata = None
+        self.cache_control = None
+        self.content_type = None


Persist fake blob metadata across blob instances

Because metadata, cache_control, and content_type are stored only on each transient FakeBlob, production paths that upload with metadata and later call bucket.blob(...).reload().metadata on a fresh object lose that state under the fake. For example utils.other.storage.get_or_create_merged_audio() relies on cached blob metadata to detect valid cached audio, so e2e coverage for that path would rebuild or skip expiry behavior that production exercises; keep per-bucket/name blob state or a sidecar metadata file with the bytes.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee71643b4a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T05:24:12Z

@@ -0,0 +1,4 @@
+fake-firestore==0.13.1
+fakeredis==2.36.2


Install fakeredis with Lua scripting

In a clean e2e install that follows this requirements file, fakeredis is installed without its lua extra, so EVAL/EVALSHA support needed by the backend's registered Lua scripts is absent. Fresh evidence after the earlier Redis delegation fix is that this new tracked requirements file still installs core fakeredis only; routes that hit database/redis_db.py rate-limit scripts or webhook health scripts can then fail open/fallback instead of exercising the fake Redis script behavior, letting the harness report green while that Redis logic remains untested. Use fakeredis[lua] here or add the equivalent lupa dependency.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b64da7fcf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T06:55:26Z

+            count = 1
+        r.hset(key, 'failure_count', str(count))
+
+    disabled = count >= _DEV_FAILURE_THRESHOLD


Return true only on the disable transition

When this non-Lua fallback is used in a Redis-compatible store without scripting, concurrent webhook failures can both pass the earlier disabled check, receive HINCRBY counts at or above the threshold, and return True; each caller then runs the auto-disable path and sends the developer notification. The Lua path serializes the disabled check/set, so this fallback is not equivalent under multi-worker traffic right as a webhook crosses the threshold.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5be32d8d26

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T07:54:00Z

+        echo "ERROR: e2e pytest exceeded timeout ${PYTEST_TIMEOUT}"
+    fi
+else
+    python -m pytest testing/e2e/ "$@"


Preserve the e2e timeout when timeout is unavailable

In environments where command -v timeout is false, this fallback runs pytest without any bound, so the advertised E2E_PYTEST_TIMEOUT protection is silently disabled and websocket/provider regressions can hang the one-command harness indefinitely. Since the script already documents that those regressions should fail instead of hanging, make the fallback enforce the same timeout or fail with a clear prerequisite error.

Useful? React with 👍 / 👎.

kodjima33

Backend e2e test harness — useful infra; approve only (feature/infra, Nik owns backend test-suite direction).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7974ff5f5f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T16:54:40Z

+  hermetic-e2e:
+    name: Hermetic Backend E2E
+    runs-on: ubuntu-latest
+    continue-on-error: true


Remove the e2e job's continue-on-error

For backend PRs where bash backend/testing/e2e/run.sh fails or times out, this job is still treated as non-blocking by GitHub Actions, so the workflow can remain green and required checks won't prevent the regression from merging. That defeats the new CI coverage for the hermetic route scenarios; remove continue-on-error or restrict it to an explicitly experimental/manual path.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9666d2185

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T21:58:02Z

+
+set +e
+if command -v timeout >/dev/null 2>&1; then
+    timeout --preserve-status --kill-after=5s "$PYTEST_TIMEOUT" python -m pytest testing/e2e/ "$@"


Prewarm tiktoken in the local runner

In a clean developer/CI environment where the tiktoken cache is empty, this direct pytest launch installs the socket guard before importing the backend, and utils/llm/clients.py calls tiktoken.encoding_for_model('gpt-4') at module import. The workflow has a prewarm step, but bash backend/testing/e2e/run.sh does not, so the advertised local one-command harness can fail on a blocked DNS lookup to the tokenizer blob host before any tests run. Prewarm the tokenizer cache before starting pytest or stub that import for the e2e process.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T21:58:03Z

+        if getattr(module, "db", None) is old_db:
+            setattr(module, "db", fake_firestore_instance)
+        if getattr(module, "r", None) is old_r:
+            setattr(module, "r", fake_redis_instance)


Relink aliased fake clients too

This relink only replaces module globals named exactly db or r. I checked repo-wide aliases and routers/integration.py, routers/notifications.py, and utils/fair_use.py bind r as redis_client while utils/retrieval/tools/screen_activity_tools.py binds db as firestore_db; once these modules are imported by the real app, those aliases keep the pre-patch clients and routes through them can hit real localhost/ADC clients or fail-open instead of exercising the hermetic fakes. Iterate module variables by identity rather than only checking the two literal names.

Useful? React with 👍 / 👎.

cubic-dev-ai

5 issues found across 31 files

_{Tip: instead of fixing issues one by one fix them all with cubic

Re-trigger cubic}

cubic-dev-ai

1 issue found across 8 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/testing/e2e/run.sh">

<violation number="1" location="backend/testing/e2e/run.sh:80">
P2: The runner unconditionally performs a potentially networked tokenizer download, creating a non-hermetic external dependency in the harness entrypoint.</violation>
</file>

_{Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic}

cubic-dev-ai · 2026-06-18T22:22:20Z

+python - <<'PY'
+import tiktoken
+
+tiktoken.encoding_for_model('gpt-4')


P2: The runner unconditionally performs a potentially networked tokenizer download, creating a non-hermetic external dependency in the harness entrypoint.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At backend/testing/e2e/run.sh, line 80: <comment>The runner unconditionally performs a potentially networked tokenizer download, creating a non-hermetic external dependency in the harness entrypoint.</comment> <file context> @@ -70,6 +70,16 @@ fi +python - <<'PY' +import tiktoken + +tiktoken.encoding_for_model('gpt-4') +PY + </file context>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d810bb317

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T22:28:51Z

+    """The non-Lua fallback should match the Lua script's one-shot disable semantics."""
+
+    monkeypatch.setattr(webhook_health, "_DEV_FAILURE_THRESHOLD", 2)
+    monkeypatch.setattr(webhook_health, "r", fake_redis)


Force the non-Lua path in the fallback test

With testing/e2e/requirements.txt now installing fakeredis[lua], assigning webhook_health.r to fake_redis still gives record_dev_webhook_failure() a working register_script, so these assertions exercise the Lua script path rather than _record_dev_webhook_failure_fallback(). In Redis-compatible environments without scripting, the fallback can regress while this test remains green; make this test force _get_dev_failure_script()/script execution to raise or use a fake that lacks scripting support.

Useful? React with 👍 / 👎.

feat: add hermetic backend e2e harness

713a9e1

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 18, 2026

View reviewed changes

test: harden hermetic e2e harness

97186a6

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

test: expand hermetic backend e2e coverage

fabdc5a

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

test: assert deterministic processing memory readback

2181d5a

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

david-zhang added 2 commits June 18, 2026 03:52

test: cover custom STT listen transcript persistence

d726e5b

test: cover webhook and task integration failures

a638021

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

david-zhang added 2 commits June 18, 2026 04:44

test: cover retrieval search critical path

ee71643

test: cover mobile lifecycle compatibility

adc33b4

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

test: harden e2e webhook fallback and timeout

7b64da7

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

test: harden hermetic e2e CI readiness

5be32d8

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

ci: add non-blocking hermetic backend e2e check

71067a6

kodjima33 approved these changes Jun 18, 2026

View reviewed changes

david-zhang added 2 commits June 18, 2026 16:46

ci: include pytest in e2e dependencies

b145c22

ci: prewarm tokenizer cache for hermetic e2e

7974ff5

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

docs: align e2e harness README with CI state

f9666d2

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread .github/workflows/backend-hermetic-e2e.yml Outdated

Comment thread backend/testing/e2e/fakes/firestore.py

Comment thread backend/database/webhook_health.py Outdated

Comment thread backend/testing/e2e/run.sh

Comment thread backend/testing/e2e/conftest.py Outdated

fix: address e2e harness review feedback

9d810bb

cubic-dev-ai Bot reviewed Jun 18, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

Git-on-my-level merged commit 9838ea2 into main Jun 19, 2026
4 checks passed

Git-on-my-level deleted the feat/e2e-hermetic-harness branch June 19, 2026 00:00

		@@ -0,0 +1,171 @@
		# Hermetic Backend E2E Harness

		A manually runnable integration test suite that imports the real omi FastAPI backend and exercises selected routes against faked or disabled external dependencies. It is intended as a local dogfood harness first; there is no CI wiring yet.

		@@ -0,0 +1,4 @@
		fake-firestore==0.13.1
		fakeredis==2.36.2

Conversation

Git-on-my-level commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this covers

CI

Verification

Explicit v1 limitations

Framing

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (3)

Uh oh!

greptile-apps Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Git-on-my-level commented Jun 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Git-on-my-level commented Jun 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Git-on-my-level commented Jun 18, 2026 •

edited

Loading

greptile-apps Bot commented Jun 18, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

cubic-dev-ai Bot Jun 18, 2026 •

edited

Loading