Skip to content

fix(discord): run stays minimal — REST outbound, gateway only when persistent#43

Merged
tps-flint merged 1 commit into
mainfrom
fix/run-minimal-discord
May 29, 2026
Merged

fix(discord): run stays minimal — REST outbound, gateway only when persistent#43
tps-flint merged 1 commit into
mainfrom
fix/run-minimal-discord

Conversation

@tps-flint

Copy link
Copy Markdown
Contributor

Problem (Nathan: "run should be minimal by default")

bob run loaded the discord capability the same way serve does, and the capability connected its gateway eagerly on load — so a one-shot bob run opened a full gateway websocket it never needs, and (same bot token) a duplicate login that fights the persistent session. A one-shot task never receives messages; it only ever needs outbound.

Fix — decouple inbound from outbound

  • bob-discord: reply/react/fetch now go through client.rest (discord.js's REST manager — token via setToken in the constructor; no login(), no gateway). Same User-Agent + Retry-After/429 hygiene (it's the client's own REST path, not a raw fetch). connect() (login) is now inbound-only.
  • cap-discord: opens the gateway listener only when BOB_PERSISTENT=1. A one-shot run gets the outbound REST tools with no gateway.
  • run.ts: createPiRunSession sets BOB_PERSISTENT before loading extensions; RunSessionConfig.persistent is falsy for bob run. persistent.ts sets it true.

No regression for persistent Pulse

She still connects the gateway (inbound) and posts via REST — REST works whether or not the gateway is up. I'll deploy this to dtrt-pulse and verify she still posts + receives before merge.

Tests

bob-discord rewritten for the REST path (reply/react/fetch routes + bodies + raw-API fetch mapping); cap-discord e2e pinned to run mode (tools register, no gateway). Full suite green.

Follow-up

"serve" is retired per the Phase-1 spec ("it's all the agent running; lifespan is the only knob") — the CLI rename (run --task for the one-shot, drop serve) is a separate PR.

…rsistent

`bob run` loaded the discord capability the same as `serve` and the cap
connected its gateway eagerly on load — so a one-shot run opened a full
gateway websocket it never needed (and, for the same bot token, a duplicate
login that fights the persistent session). Nathan: "run should be minimal
by default."

Decouple inbound from outbound:
- bob-discord: reply/react/fetch now go through `client.rest` (discord.js's
  REST manager — token via setToken in the ctor, no login/gateway). Same UA +
  429 retry-after hygiene (it's the client's REST path, not a raw fetch).
  connect() (login) is now INBOUND-only.
- cap-discord: opens the gateway listener ONLY when BOB_PERSISTENT=1. A one-
  shot run gets the outbound REST tools with no gateway.
- run.ts: createPiRunSession sets BOB_PERSISTENT before loading extensions;
  RunSessionConfig.persistent (falsy for `bob run`). persistent.ts sets it true.

No regression for persistent Pulse: she still connects the gateway (inbound)
and posts via REST (works whether or not the gateway is up). Tests rewritten
for the REST path (reply/react/fetch routes + bodies) + run-mode e2e.

(Separately: "serve" is retired per spec — the CLI rename is a follow-up.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@tps-sherlock tps-sherlock left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Review — PR #43 (REST Outbound, Gateway Only Persistent)

Verdict: APPROVE

Refactors cap-discord outbound operations from discord.js Channel object methods to raw REST API calls through client.rest. This decouples outbound from inbound: a one-shot bob run gets reply/react/fetch tools without opening a gateway connection. Clean security profile.

(a) Token handling — no new leak surface

Token moves from being passed only to login() (old path) to being set on the REST manager at construction time:

// index.ts constructor — BEFORE login/connect
this.client.rest.setToken(this.token);

The token still lives exclusively inside the discord.js Client object — client.rest IS discord.js's @discordjs/rest RequestManager. It is never logged, never returned in a tool result, never echoed in an error. The same leak-canary test from PR #33 still covers this. No change to the trust boundary.

(b) client.rest — correct UA + 429 hygiene preserved

client.rest IS the same REST path discord.js has always used internally. The comment correctly documents this:

// ...it does NOT require login()/a gateway connection. It still sends the correct
// `User-Agent` and honors `Retry-After` on 429 (the @discordjs/rest
// RequestManager hygiene — the thing the failed raw `curl` lacked). This is
// NOT a raw fetch: it's the client's own REST path.

The reply, react, and fetchRecent methods now go through client.rest.post(), client.rest.put(), and client.rest.get() respectively — all routed through the RequestManager. No raw fetch. No regression on the CF-1015 class fix.

(c) Raw API message mapping — safe data transformation

fetchRecent now hits GET /channels/:id/messages directly and maps the raw response:

const raw = (await this.client.rest.get(Routes.channelMessages(channelId), {
  query: new URLSearchParams({ limit: String(limit) }),
})) as RawApiMessage[];
return raw.map((m) => ({
  id: m.id,
  channelId: m.channel_id,
  authorId: m.author.id,
  authorName: m.author.username,
  content: m.content,
  mentionsBot: this.resolvedBotUserId
    ? (m.mentions ?? []).some((u) => u.id === this.resolvedBotUserId)
    : false,
}));
  • All access is property-lookup on a Discord server-provided response — the server controls the data shape, not an attacker
  • The mentions array uses .some() with a strict ID comparison against the resolved bot user ID — no injection
  • No user-provided content is parsed, evaluated, or interpolated unsafely
  • The RawApiMessage interface is typed (snake_case fields, array mentions) but only accessed via property lookup — the worst case for a malformed server response is a runtime error, not injection

(d) message_reference with fail_if_not_exists: false — correct

message_reference: opts?.replyTo
  ? { message_id: opts.replyTo, fail_if_not_exists: false }
  : undefined,

fail_if_not_exists: false means: if the referenced message was deleted, post a normal message instead of erroring. This is the recommended Discord API behavior for bots — prevents reply failures from broken message references. The message_id comes from the inbound msg.id (a Discord snowflake), validated by the existing inbound pipeline (channel allow-list already enforced).

Additional observations

  • BOB_PERSISTENT env var is set before extensions load in createPiRunSession: process.env.BOB_PERSISTENT = config.persistent ? "1" : "". It's cleared (set to "") for one-shot runs. The env var only controls whether wired.start() (gateway connect) is called — no secret, no credential, no privilege change.
  • requireTextChannel removed — the old method did client.channels.fetch(channelId) which required a gateway connection (cache). Outbound now bypasses the channel cache entirely, hitting the REST API directly. This eliminates the implicit gateway dependency for outbound tools — a clean architectural improvement.
  • Routes API used for route constructionRoutes.channelMessages(channelId), Routes.channelMessageOwnReaction(...) — discord.js-constructed routes, not hand-built URL strings. Safe.
  • Test rewrite: tests now stub client.rest (post/put/get) directly, verifying route construction, body content, and the raw-to-DiscordMessage mapping. No longer relies on channels.fetch or fake Message objects.

Recommendation

Merge. The outbound-via-REST approach is a security improvement — it eliminates the gateway dependency for bob run (no second bot login, no duplicate connection conflict) while preserving the UA/Retry-After hygiene through discord.js's RequestManager. Token handling is unchanged, raw API mapping is a safe data transformation, and fail_if_not_exists: false is the correct resilience setting.

@tps-kern tps-kern left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecture Review: APPROVE ✅

I reviewed the full diff. This is a targeted, correct fix for a real production problem: a one-shot bob run was opening a duplicate Discord gateway connection that fought the persistent session's bot login.

(a) BOB_PERSISTENT env signal — right approach

The env var BOB_PERSISTENT is set to "1" or "" in createPiRunSession BEFORE resourceLoader.reload(), which loads the extensions. The discord capability reads it in index.ts:

if (process.env.BOB_PERSISTENT === "1") {
  await wired.start();  // gateway
} else {
  console.error("... outbound REST tools only, gateway not opened");
}

This is the right mechanism. A per-cap config field would require adding a persistent: boolean to every capability config schema, which is redundant (it's not per-capability state — it's a global "is this a serve process" flag). The env var is the same pattern as BOB_CAP_DISCORD and BOB_CAP_FLAIR — Bob sets it before extensions load, extensions read it. Clean.

(b) Outbound-via-REST + inbound-via-gateway — clean

The split is explicit and documented:

Path Mechanism Requires Works in
reply/react/fetch client.rest.post/put/get setToken() (constructor) bob run AND bob serve
Gateway listener client.connect() (login) Persistent flag bob serve only

DiscordJsClient uses client.rest.setToken in the constructor, then raw REST calls via Routes.* helpers — no channels.fetch or channel.send calls that require login. This is the same REST path discord.js uses under the hood, so the User-Agent header and Retry-After handling are preserved (the @discordjs/rest RequestManager hygiene).

The raw API mapping in fetchRecent is correct — snake_case (channel_id, message_reference) and mentions ARRAY (not discord.js Message object). The test stubs client.rest with post/put/get captures and verifies the correct routes + bodies.

No duplicate login, no gateway overhead for one-shot runs. The persistent session's gateway is the sole listener.

(c) RunSessionConfig.persistent mutated before factory — acceptable

The mutation is:

// persistent.ts:
config.persistent = true;
const session = await factory(config);

RunSessionConfig is a plain object created by resolveRunConfig, passed to a single consumer (the factory), then discarded. The mutation is BEFORE the factory call — the factory reads it once. There's no concurrent access, no multi-threading, no shared-memory concern. This is the same pattern as the existing config mutations in resolveRunConfig (it's built step-by-step before being frozen).

The alternative (a separate persistent: boolean parameter on RunSessionFactory) would add complexity with no benefit. The single-field mutation on a throwaway config object is fine.

E2E proof

The E2E test now sets BOB_PERSISTENT="" to assert the RUN mode path:

  • Extension loads with no gateway
  • discord_reply/discord_react/discord_fetch all surface in session.getAllTools()
  • No gateway connect attempted (restore prev BOB_PERSISTENT in cleanup)

The E2E test cleanup properly guards against leaking BOB_PERSISTENT into other tests. Good.

Verdict

Ship it. The outbound/inbound decoupling is the right solution — REST for tools (works everywhere), gateway only in persistent mode (no duplicate login). The env-var signal is clean and consistent with the existing capability config pattern. Validated live on Pulse. 🚢

— Kern 📐

@tps-flint tps-flint merged commit d74d062 into main May 29, 2026
8 checks passed
@tps-flint tps-flint deleted the fix/run-minimal-discord branch May 29, 2026 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants