Skip to content

fix(openclaw): strip runtime metadata from memory content#1968

Open
de1tydev wants to merge 4 commits into
vectorize-io:mainfrom
de1tydev:fix/openclaw-runtime-metadata
Open

fix(openclaw): strip runtime metadata from memory content#1968
de1tydev wants to merge 4 commits into
vectorize-io:mainfrom
de1tydev:fix/openclaw-runtime-metadata

Conversation

@de1tydev

@de1tydev de1tydev commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Depends on #1921. This branch is stacked on the retain-context PR; after #1921 merges, this PR should shrink to the runtime metadata cleanup commit.

Problem

OpenClaw/Feishu runtime identifiers can appear in message text as retain/recall input, for example [message_id: om_x...], standalone om_/ou_/oc_ values, or ou_x...: user text sender prefixes. If passed through to Hindsight, those opaque IDs can pollute recall queries, temporal retrieval, retained transcripts, and later extracted memories.

Root Cause

The OpenClaw Hindsight plugin stripped metadata envelopes and memory tags, but it did not consistently strip inline runtime message IDs or opaque sender prefixes before composing recall queries or retained transcript content. It also still supported prepending sender/channel/provider context into retained content, which made routing metadata part of the semantic transcript.

Solution

  • Add stripRuntimeEnvelope() for Feishu/OpenClaw runtime IDs and sender prefixes.
  • Apply the cleanup in recall query extraction, prior-context recall composition, text retain, and structured retain paths.
  • Stop prepending sender/channel/provider context into retained transcript content; keep that information in retain metadata/context instead.
  • Update retainContext guidance and deprecate includeSenderContext as a content-writing option.
  • Add tests for recall cleanup, retained transcript cleanup, metadata preservation, and the no-context-in-content contract.

Validation

  • npm test -- --run src/index.test.ts in hindsight-integrations/openclaw (8 files, 257 tests)
  • npm run build in hindsight-integrations/openclaw
  • git diff --check 4667d84742efdb599bc93ae2c523ff5217924acd...HEAD
  • repository pre-commit hooks during commit

de1tydev added 4 commits June 4, 2026 11:26
…ata misattribution

Hindsight's fact extraction LLM was misinterpreting routing identifiers
(sender open_id, bank ID, channel, provider) as semantic actors, project
names, or organizations. After many conversation turns, the bank name
(e.g. saber-prod) would override the actual project being discussed
(e.g. x-power-cli).

This adds interpretation guidance via the retain API 'context' field:
- New DEFAULT_RETAIN_CONTEXT constant explains that [context] block
  sender/channel/provider are routing identifiers, not human names
- Bank IDs, session keys, agent IDs, thread IDs, and tags are also
  marked as operational routing identifiers, not project names
- Assistant-role first-person statements are attributed to the AI
- Context is passed through the full chain: buildRetainRequest →
  scopeClient.retain → Hindsight SDK API
- RetainQueue persists and flushes context correctly
- Backfill CLI also passes context
- New 'retainContext' config option allows customization

includeSenderContext behavior is unchanged; the [context] block remains
in transcript content, but extraction LLM now knows how to interpret it.

7 files changed, 97 insertions(+).
@de1tydev de1tydev marked this pull request as ready for review June 4, 2026 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant