fix(api): get_entity returns linked observations instead of empty list by cdbartholomew · Pull Request #2154 · vectorize-io/hindsight

cdbartholomew · 2026-06-11T20:08:43Z

Summary

MemoryEngine.get_entity() returns "observations": [] hardcoded, so GET /v1/default/banks/{bank_id}/entities/{entity_id} always reports zero observations regardless of how many actually reference the entity. The response schema (EntityDetailResponse.observations: list[EntityObservationResponse]) and the example in EntityObservation's model config both advertise the field. The deprecated regenerate_entity_observations endpoint at the same path reinforces that this field was once user-visible. The engine method simply never followed the traversal needed to populate it.

The traversal already exists in the reverse direction. _entity_rows_for_units_sql (memory_engine.py) walks observation → source_memory_ids → unit_entities of the source memories to inherit entities on recall results. The _create_memory_links placeholder in consolidation/consolidator.py documents this transitive design explicitly:

Observations do NOT get any memory_links copied from their source facts. Instead, retrieval uses source_memory_ids to traverse: observation → source_memory_ids → unit_entities

The entity-detail path simply never asked the inverse question — "given an entity, which observations mention it?" That's what this PR adds.

What changes

Replaces the stub list in get_entity() with a real query, dispatched through a new _observations_for_entity_sql() helper that mirrors the dialect split of _entity_rows_for_units_sql.
PG path: GIN-indexed array overlap on memory_units.source_memory_ids. The supporting index already exists (a2b3c4d5e6f8_add_gin_index_source_memory_ids).
Oracle path: joins through the observation_sources junction table, uses FETCH FIRST ... ROWS ONLY per the existing Oracle convention in ops_oracle.py.
Both branches UNION a direct unit_entities path so an observation that happens to carry its own entity row (rare, e.g. test fixtures, manual inserts) isn't dropped.
New max_observations: int = 50 parameter, hard-capped at 200. Hot entities (thousands of mentions) can't make a single panel render pull unbounded rows.
Ordered by mentioned_at DESC NULLS LAST so the freshest material surfaces first.
HTTP wrapper at api/http.py already iterates entity["observations"] and maps to EntityObservationResponse(text=obs.text, mentioned_at=obs.mentioned_at) — no API contract change, no OpenAPI regen needed.

Cost model

Same query shape as the existing recall-side traversal, traversed in the opposite direction:

Typical entity (≤50 mentions): ~5 ms warm cache, cheaper than a recall-side inheritance lookup.
Hot entity (~2K+ mentions, deep observation history): ~20–80 ms warm with LIMIT 50, driven by top-N sort over the candidate observation set. Well within budget for a click-driven endpoint.
No new index needed; the GIN on source_memory_ids and the partial idx_memory_units_observation_date carry the work.

Test plan

New tests/test_get_entity_observations.py:

Observation linked via source_memory_ids inheritance surfaces under the entity.
Observation with a direct unit_entities row surfaces too (UNION branch).
Observation unrelated to the entity does NOT surface.
Results ordered by mentioned_at DESC.
max_observations cap is honored end-to-end.
Missing entity_id still returns None.

Local verification:

All 4 new tests pass against PG.
ruff check clean.
Adjacent observation/entity test files run without regressions (34 passed; 2 pre-existing env-related errors confirmed identical on plain main).
CI green.

Notes

The _create_memory_links placeholder remains intentionally a no-op — the design decision to avoid duplicating entity rows for observations is preserved. This PR only restores read-time access to the data that's already present in the graph.
Could be paired in a follow-up with a small EntityDetailPanel enhancement in hindsight-control-plane/src/components/entities-view.tsx to render the now-populated observations field; left out of this PR to keep scope tight.

`MemoryEngine.get_entity()` returned `"observations": []` hardcoded, so the entity-detail endpoint (`GET /v1/default/banks/{bank_id}/entities/ {entity_id}`) always reported zero observations regardless of how many actually referenced the entity. The traversal needed already existed in the reverse direction. On a recall result, `_entity_rows_for_units_sql` walks observation → `source_memory_ids` → `unit_entities` of the source memories to surface inherited entities. The `_create_memory_links` placeholder in `consolidation/consolidator.py` documents this transitive design and is explicit that observations don't carry direct `unit_entities` rows. The entity-detail path simply never asked the inverse question — "given an entity, which observations mention it?" This patch: - Replaces the stub list with a real query in `get_entity()`, dispatched through a new `_observations_for_entity_sql()` helper. - PG branch uses GIN-indexed array overlap on `memory_units.source_memory_ids` (`a2b3c4d5e6f8_add_gin_index_source_memory_ids` already provides the supporting index). - Oracle branch joins through the `observation_sources` junction table and uses `FETCH FIRST ... ROWS ONLY` per dialect convention. - Both branches UNION a direct `unit_entities` path so the rare observation that does carry its own entity row is not dropped. - Adds `max_observations: int = 50` with a hard cap of 200 so a panel render can't accidentally pull thousands for a hot entity. - Ordered by `mentioned_at DESC NULLS LAST` so the freshest observations surface first. The HTTP layer at `api/http.py` already iterates `entity["observations"]` and maps to `EntityObservationResponse(text, mentioned_at)` — no API contract change. Test coverage in `test_get_entity_observations.py`: - Inherited observation (linked via `source_memory_ids`) surfaces. - Direct observation (rare `unit_entities` row on the observation itself) also surfaces. - Observations unrelated to the entity do not surface. - Ordering: most recent first. - `max_observations` cap is honored end-to-end. - Missing entity_id still returns None. Closes the gap that turned the entity-detail endpoint into a no-op for any consumer that listed observations under an entity.

cdbartholomew requested a review from nicoloboschi June 11, 2026 20:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): get_entity returns linked observations instead of empty list#2154

fix(api): get_entity returns linked observations instead of empty list#2154
cdbartholomew wants to merge 1 commit into
mainfrom
fix/get-entity-observations-traversal

cdbartholomew commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cdbartholomew commented Jun 11, 2026

Summary

What changes

Cost model

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant