fix(openclaw): strip runtime metadata from memory content by de1tydev · Pull Request #1968 · vectorize-io/hindsight

de1tydev · 2026-06-04T04:10:12Z

Depends on #1921. This branch is stacked on the retain-context PR; after #1921 merges, this PR should shrink to the runtime metadata cleanup commit.

Problem

OpenClaw/Feishu runtime identifiers can appear in message text as retain/recall input, for example [message_id: om_x...], standalone om_/ou_/oc_ values, or ou_x...: user text sender prefixes. If passed through to Hindsight, those opaque IDs can pollute recall queries, temporal retrieval, retained transcripts, and later extracted memories.

Root Cause

The OpenClaw Hindsight plugin stripped metadata envelopes and memory tags, but it did not consistently strip inline runtime message IDs or opaque sender prefixes before composing recall queries or retained transcript content. It also still supported prepending sender/channel/provider context into retained content, which made routing metadata part of the semantic transcript.

Solution

Add stripRuntimeEnvelope() for Feishu/OpenClaw runtime IDs and sender prefixes.
Apply the cleanup in recall query extraction, prior-context recall composition, text retain, and structured retain paths.
Stop prepending sender/channel/provider context into retained transcript content; keep that information in retain metadata/context instead.
Update retainContext guidance and deprecate includeSenderContext as a content-writing option.
Add tests for recall cleanup, retained transcript cleanup, metadata preservation, and the no-context-in-content contract.

Validation

npm test -- --run src/index.test.ts in hindsight-integrations/openclaw (8 files, 257 tests)
npm run build in hindsight-integrations/openclaw
git diff --check 4667d84742efdb599bc93ae2c523ff5217924acd...HEAD
repository pre-commit hooks during commit

…ata misattribution Hindsight's fact extraction LLM was misinterpreting routing identifiers (sender open_id, bank ID, channel, provider) as semantic actors, project names, or organizations. After many conversation turns, the bank name (e.g. saber-prod) would override the actual project being discussed (e.g. x-power-cli). This adds interpretation guidance via the retain API 'context' field: - New DEFAULT_RETAIN_CONTEXT constant explains that [context] block sender/channel/provider are routing identifiers, not human names - Bank IDs, session keys, agent IDs, thread IDs, and tags are also marked as operational routing identifiers, not project names - Assistant-role first-person statements are attributed to the AI - Context is passed through the full chain: buildRetainRequest → scopeClient.retain → Hindsight SDK API - RetainQueue persists and flushes context correctly - Backfill CLI also passes context - New 'retainContext' config option allows customization includeSenderContext behavior is unchanged; the [context] block remains in transcript content, but extraction LLM now knows how to interpret it. 7 files changed, 97 insertions(+).

…CONTEXT

de1tydev added 4 commits June 4, 2026 11:26

fix(openclaw): remove platform-specific examples from DEFAULT_RETAIN_…

d91349a

…CONTEXT

test(openclaw): harden retain context handling

4667d84

fix(openclaw): strip runtime metadata from memory content

c9c7f21

de1tydev marked this pull request as ready for review June 4, 2026 06:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(openclaw): strip runtime metadata from memory content#1968

fix(openclaw): strip runtime metadata from memory content#1968
de1tydev wants to merge 4 commits into
vectorize-io:mainfrom
de1tydev:fix/openclaw-runtime-metadata

de1tydev commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

de1tydev commented Jun 4, 2026

Problem

Root Cause

Solution

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant