Skip to content

fix(api): get_entity returns linked observations instead of empty list#2154

Open
cdbartholomew wants to merge 1 commit into
mainfrom
fix/get-entity-observations-traversal
Open

fix(api): get_entity returns linked observations instead of empty list#2154
cdbartholomew wants to merge 1 commit into
mainfrom
fix/get-entity-observations-traversal

Conversation

@cdbartholomew

Copy link
Copy Markdown
Contributor

Summary

MemoryEngine.get_entity() returns "observations": [] hardcoded, so GET /v1/default/banks/{bank_id}/entities/{entity_id} always reports zero observations regardless of how many actually reference the entity. The response schema (EntityDetailResponse.observations: list[EntityObservationResponse]) and the example in EntityObservation's model config both advertise the field. The deprecated regenerate_entity_observations endpoint at the same path reinforces that this field was once user-visible. The engine method simply never followed the traversal needed to populate it.

The traversal already exists in the reverse direction. _entity_rows_for_units_sql (memory_engine.py) walks observation → source_memory_idsunit_entities of the source memories to inherit entities on recall results. The _create_memory_links placeholder in consolidation/consolidator.py documents this transitive design explicitly:

Observations do NOT get any memory_links copied from their source facts. Instead, retrieval uses source_memory_ids to traverse: observation → source_memory_ids → unit_entities

The entity-detail path simply never asked the inverse question — "given an entity, which observations mention it?" That's what this PR adds.

What changes

  • Replaces the stub list in get_entity() with a real query, dispatched through a new _observations_for_entity_sql() helper that mirrors the dialect split of _entity_rows_for_units_sql.
  • PG path: GIN-indexed array overlap on memory_units.source_memory_ids. The supporting index already exists (a2b3c4d5e6f8_add_gin_index_source_memory_ids).
  • Oracle path: joins through the observation_sources junction table, uses FETCH FIRST ... ROWS ONLY per the existing Oracle convention in ops_oracle.py.
  • Both branches UNION a direct unit_entities path so an observation that happens to carry its own entity row (rare, e.g. test fixtures, manual inserts) isn't dropped.
  • New max_observations: int = 50 parameter, hard-capped at 200. Hot entities (thousands of mentions) can't make a single panel render pull unbounded rows.
  • Ordered by mentioned_at DESC NULLS LAST so the freshest material surfaces first.
  • HTTP wrapper at api/http.py already iterates entity["observations"] and maps to EntityObservationResponse(text=obs.text, mentioned_at=obs.mentioned_at) — no API contract change, no OpenAPI regen needed.

Cost model

Same query shape as the existing recall-side traversal, traversed in the opposite direction:

  • Typical entity (≤50 mentions): ~5 ms warm cache, cheaper than a recall-side inheritance lookup.
  • Hot entity (~2K+ mentions, deep observation history): ~20–80 ms warm with LIMIT 50, driven by top-N sort over the candidate observation set. Well within budget for a click-driven endpoint.
  • No new index needed; the GIN on source_memory_ids and the partial idx_memory_units_observation_date carry the work.

Test plan

New tests/test_get_entity_observations.py:

  • Observation linked via source_memory_ids inheritance surfaces under the entity.
  • Observation with a direct unit_entities row surfaces too (UNION branch).
  • Observation unrelated to the entity does NOT surface.
  • Results ordered by mentioned_at DESC.
  • max_observations cap is honored end-to-end.
  • Missing entity_id still returns None.

Local verification:

  • All 4 new tests pass against PG.
  • ruff check clean.
  • Adjacent observation/entity test files run without regressions (34 passed; 2 pre-existing env-related errors confirmed identical on plain main).
  • CI green.

Notes

  • The _create_memory_links placeholder remains intentionally a no-op — the design decision to avoid duplicating entity rows for observations is preserved. This PR only restores read-time access to the data that's already present in the graph.
  • Could be paired in a follow-up with a small EntityDetailPanel enhancement in hindsight-control-plane/src/components/entities-view.tsx to render the now-populated observations field; left out of this PR to keep scope tight.

`MemoryEngine.get_entity()` returned `"observations": []` hardcoded, so
the entity-detail endpoint (`GET /v1/default/banks/{bank_id}/entities/
{entity_id}`) always reported zero observations regardless of how many
actually referenced the entity.

The traversal needed already existed in the reverse direction. On a
recall result, `_entity_rows_for_units_sql` walks observation →
`source_memory_ids` → `unit_entities` of the source memories to surface
inherited entities. The `_create_memory_links` placeholder in
`consolidation/consolidator.py` documents this transitive design and is
explicit that observations don't carry direct `unit_entities` rows. The
entity-detail path simply never asked the inverse question — "given an
entity, which observations mention it?"

This patch:

- Replaces the stub list with a real query in `get_entity()`, dispatched
  through a new `_observations_for_entity_sql()` helper.
- PG branch uses GIN-indexed array overlap on `memory_units.source_memory_ids`
  (`a2b3c4d5e6f8_add_gin_index_source_memory_ids` already provides the
  supporting index).
- Oracle branch joins through the `observation_sources` junction table
  and uses `FETCH FIRST ... ROWS ONLY` per dialect convention.
- Both branches UNION a direct `unit_entities` path so the rare
  observation that does carry its own entity row is not dropped.
- Adds `max_observations: int = 50` with a hard cap of 200 so a panel
  render can't accidentally pull thousands for a hot entity.
- Ordered by `mentioned_at DESC NULLS LAST` so the freshest
  observations surface first.

The HTTP layer at `api/http.py` already iterates `entity["observations"]`
and maps to `EntityObservationResponse(text, mentioned_at)` — no API
contract change.

Test coverage in `test_get_entity_observations.py`:
- Inherited observation (linked via `source_memory_ids`) surfaces.
- Direct observation (rare `unit_entities` row on the observation
  itself) also surfaces.
- Observations unrelated to the entity do not surface.
- Ordering: most recent first.
- `max_observations` cap is honored end-to-end.
- Missing entity_id still returns None.

Closes the gap that turned the entity-detail endpoint into a no-op for
any consumer that listed observations under an entity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant