fix(discord): run stays minimal — REST outbound, gateway only when persistent#43
Conversation
…rsistent `bob run` loaded the discord capability the same as `serve` and the cap connected its gateway eagerly on load — so a one-shot run opened a full gateway websocket it never needed (and, for the same bot token, a duplicate login that fights the persistent session). Nathan: "run should be minimal by default." Decouple inbound from outbound: - bob-discord: reply/react/fetch now go through `client.rest` (discord.js's REST manager — token via setToken in the ctor, no login/gateway). Same UA + 429 retry-after hygiene (it's the client's REST path, not a raw fetch). connect() (login) is now INBOUND-only. - cap-discord: opens the gateway listener ONLY when BOB_PERSISTENT=1. A one- shot run gets the outbound REST tools with no gateway. - run.ts: createPiRunSession sets BOB_PERSISTENT before loading extensions; RunSessionConfig.persistent (falsy for `bob run`). persistent.ts sets it true. No regression for persistent Pulse: she still connects the gateway (inbound) and posts via REST (works whether or not the gateway is up). Tests rewritten for the REST path (reply/react/fetch routes + bodies) + run-mode e2e. (Separately: "serve" is retired per spec — the CLI rename is a follow-up.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
tps-sherlock
left a comment
There was a problem hiding this comment.
Security Review — PR #43 (REST Outbound, Gateway Only Persistent)
Verdict: APPROVE
Refactors cap-discord outbound operations from discord.js Channel object methods to raw REST API calls through client.rest. This decouples outbound from inbound: a one-shot bob run gets reply/react/fetch tools without opening a gateway connection. Clean security profile.
(a) Token handling — no new leak surface
Token moves from being passed only to login() (old path) to being set on the REST manager at construction time:
// index.ts constructor — BEFORE login/connect
this.client.rest.setToken(this.token);The token still lives exclusively inside the discord.js Client object — client.rest IS discord.js's @discordjs/rest RequestManager. It is never logged, never returned in a tool result, never echoed in an error. The same leak-canary test from PR #33 still covers this. No change to the trust boundary.
(b) client.rest — correct UA + 429 hygiene preserved
client.rest IS the same REST path discord.js has always used internally. The comment correctly documents this:
// ...it does NOT require login()/a gateway connection. It still sends the correct
// `User-Agent` and honors `Retry-After` on 429 (the @discordjs/rest
// RequestManager hygiene — the thing the failed raw `curl` lacked). This is
// NOT a raw fetch: it's the client's own REST path.The reply, react, and fetchRecent methods now go through client.rest.post(), client.rest.put(), and client.rest.get() respectively — all routed through the RequestManager. No raw fetch. No regression on the CF-1015 class fix.
(c) Raw API message mapping — safe data transformation
fetchRecent now hits GET /channels/:id/messages directly and maps the raw response:
const raw = (await this.client.rest.get(Routes.channelMessages(channelId), {
query: new URLSearchParams({ limit: String(limit) }),
})) as RawApiMessage[];
return raw.map((m) => ({
id: m.id,
channelId: m.channel_id,
authorId: m.author.id,
authorName: m.author.username,
content: m.content,
mentionsBot: this.resolvedBotUserId
? (m.mentions ?? []).some((u) => u.id === this.resolvedBotUserId)
: false,
}));- All access is property-lookup on a Discord server-provided response — the server controls the data shape, not an attacker
- The
mentionsarray uses.some()with a strict ID comparison against the resolved bot user ID — no injection - No user-provided content is parsed, evaluated, or interpolated unsafely
- The
RawApiMessageinterface is typed (snake_case fields, array mentions) but only accessed via property lookup — the worst case for a malformed server response is a runtime error, not injection
(d) message_reference with fail_if_not_exists: false — correct
message_reference: opts?.replyTo
? { message_id: opts.replyTo, fail_if_not_exists: false }
: undefined,fail_if_not_exists: false means: if the referenced message was deleted, post a normal message instead of erroring. This is the recommended Discord API behavior for bots — prevents reply failures from broken message references. The message_id comes from the inbound msg.id (a Discord snowflake), validated by the existing inbound pipeline (channel allow-list already enforced).
Additional observations
BOB_PERSISTENTenv var is set before extensions load increatePiRunSession:process.env.BOB_PERSISTENT = config.persistent ? "1" : "". It's cleared (set to"") for one-shot runs. The env var only controls whetherwired.start()(gateway connect) is called — no secret, no credential, no privilege change.requireTextChannelremoved — the old method didclient.channels.fetch(channelId)which required a gateway connection (cache). Outbound now bypasses the channel cache entirely, hitting the REST API directly. This eliminates the implicit gateway dependency for outbound tools — a clean architectural improvement.RoutesAPI used for route construction —Routes.channelMessages(channelId),Routes.channelMessageOwnReaction(...)— discord.js-constructed routes, not hand-built URL strings. Safe.- Test rewrite: tests now stub
client.rest(post/put/get) directly, verifying route construction, body content, and the raw-to-DiscordMessage mapping. No longer relies onchannels.fetchor fake Message objects.
Recommendation
Merge. The outbound-via-REST approach is a security improvement — it eliminates the gateway dependency for bob run (no second bot login, no duplicate connection conflict) while preserving the UA/Retry-After hygiene through discord.js's RequestManager. Token handling is unchanged, raw API mapping is a safe data transformation, and fail_if_not_exists: false is the correct resilience setting.
tps-kern
left a comment
There was a problem hiding this comment.
Architecture Review: APPROVE ✅
I reviewed the full diff. This is a targeted, correct fix for a real production problem: a one-shot bob run was opening a duplicate Discord gateway connection that fought the persistent session's bot login.
(a) BOB_PERSISTENT env signal — right approach
The env var BOB_PERSISTENT is set to "1" or "" in createPiRunSession BEFORE resourceLoader.reload(), which loads the extensions. The discord capability reads it in index.ts:
if (process.env.BOB_PERSISTENT === "1") {
await wired.start(); // gateway
} else {
console.error("... outbound REST tools only, gateway not opened");
}This is the right mechanism. A per-cap config field would require adding a persistent: boolean to every capability config schema, which is redundant (it's not per-capability state — it's a global "is this a serve process" flag). The env var is the same pattern as BOB_CAP_DISCORD and BOB_CAP_FLAIR — Bob sets it before extensions load, extensions read it. Clean.
(b) Outbound-via-REST + inbound-via-gateway — clean
The split is explicit and documented:
| Path | Mechanism | Requires | Works in |
|---|---|---|---|
reply/react/fetch |
client.rest.post/put/get |
setToken() (constructor) |
bob run AND bob serve |
| Gateway listener | client.connect() (login) |
Persistent flag | bob serve only |
DiscordJsClient uses client.rest.setToken in the constructor, then raw REST calls via Routes.* helpers — no channels.fetch or channel.send calls that require login. This is the same REST path discord.js uses under the hood, so the User-Agent header and Retry-After handling are preserved (the @discordjs/rest RequestManager hygiene).
The raw API mapping in fetchRecent is correct — snake_case (channel_id, message_reference) and mentions ARRAY (not discord.js Message object). The test stubs client.rest with post/put/get captures and verifies the correct routes + bodies.
No duplicate login, no gateway overhead for one-shot runs. The persistent session's gateway is the sole listener.
(c) RunSessionConfig.persistent mutated before factory — acceptable
The mutation is:
// persistent.ts:
config.persistent = true;
const session = await factory(config);RunSessionConfig is a plain object created by resolveRunConfig, passed to a single consumer (the factory), then discarded. The mutation is BEFORE the factory call — the factory reads it once. There's no concurrent access, no multi-threading, no shared-memory concern. This is the same pattern as the existing config mutations in resolveRunConfig (it's built step-by-step before being frozen).
The alternative (a separate persistent: boolean parameter on RunSessionFactory) would add complexity with no benefit. The single-field mutation on a throwaway config object is fine.
E2E proof
The E2E test now sets BOB_PERSISTENT="" to assert the RUN mode path:
- Extension loads with no gateway
discord_reply/discord_react/discord_fetchall surface insession.getAllTools()- No gateway connect attempted (restore prev
BOB_PERSISTENTin cleanup)
The E2E test cleanup properly guards against leaking BOB_PERSISTENT into other tests. Good.
Verdict
Ship it. The outbound/inbound decoupling is the right solution — REST for tools (works everywhere), gateway only in persistent mode (no duplicate login). The env-var signal is clean and consistent with the existing capability config pattern. Validated live on Pulse. 🚢
— Kern 📐
Problem (Nathan: "run should be minimal by default")
bob runloaded the discord capability the same wayservedoes, and the capability connected its gateway eagerly on load — so a one-shotbob runopened a full gateway websocket it never needs, and (same bot token) a duplicate login that fights the persistent session. A one-shot task never receives messages; it only ever needs outbound.Fix — decouple inbound from outbound
reply/react/fetchnow go throughclient.rest(discord.js's REST manager — token viasetTokenin the constructor; nologin(), no gateway). SameUser-Agent+Retry-After/429 hygiene (it's the client's own REST path, not a raw fetch).connect()(login) is now inbound-only.BOB_PERSISTENT=1. A one-shot run gets the outbound REST tools with no gateway.createPiRunSessionsetsBOB_PERSISTENTbefore loading extensions;RunSessionConfig.persistentis falsy forbob run. persistent.ts sets it true.No regression for persistent Pulse
She still connects the gateway (inbound) and posts via REST — REST works whether or not the gateway is up. I'll deploy this to
dtrt-pulseand verify she still posts + receives before merge.Tests
bob-discord rewritten for the REST path (reply/react/fetch routes + bodies + raw-API fetch mapping); cap-discord e2e pinned to run mode (tools register, no gateway). Full suite green.
Follow-up
"serve" is retired per the Phase-1 spec ("it's all the agent running; lifespan is the only knob") — the CLI rename (
run --taskfor the one-shot, dropserve) is a separate PR.