Skip to content

Handle session reconnecting for remote server executor#11482

Draft
MaggieShan wants to merge 1 commit into
masterfrom
maggs/remote-server-executor-handle-reconnection
Draft

Handle session reconnecting for remote server executor#11482
MaggieShan wants to merge 1 commit into
masterfrom
maggs/remote-server-executor-handle-reconnection

Conversation

@MaggieShan
Copy link
Copy Markdown
Contributor

Description

  • Context: https://warpdev.slack.com/archives/C0AMRA82ZKL/p1779143889865059
  • We're getting a lot of client error related to the run command request failing when the session is disconnected - this pr ensures that we're not sending any run command requests unless the remote server client is actually connected
    • We now hold a RwLock<Option<Arc<RemoteServerClient>>> so that we can clear/set a RemoteServerClient during disconnection and reconnection
    • We add a SessionReconnecting event to cover
  • We are getting flooded with client errors from sending run command request when the session is already disconnected. I don’t think this is an actual user-facing error but we should avoid sending the request when we know the remote server is disconnected (https://warp.metabaseapp.com/question/42606-daily-client-request-error-rate-by-operation-error-type)

Testing

Verified locally that on disconnection the run command request does not get sent + client error is not logged.

  • I have manually tested my changes locally with ./script/run

Agent Mode

  • Warp Agent Mode - This PR was created via Warp's AI Agent Mode

@cla-bot cla-bot Bot added the cla-signed label May 21, 2026
ATERCATES pushed a commit to ATERCATES/warp that referenced this pull request May 22, 2026
The control-plane reader task previously exited silently when stdout
EOF'd unexpectedly (i.e. the underlying SSH transport died without
emitting a tmux %exit). Detection deferred to the heartbeat path,
which has interval 30s and timeout 10s — up to ~40s of "dead but
appears alive" state during which send_command would write to a
half-closed pipe and time out at RESPONSE_TIMEOUT (10s) per call
instead of failing fast.

This commit:
- Adds signal_master_died(stdin, events_tx) helper that idempotently
  takes the stdin and emits MasterDied. Idempotent because the first
  caller takes the Option<ChildStdin>; subsequent callers see None
  and no-op, preventing duplicate events when reader and heartbeat
  race to detect.
- Reader task now distinguishes clean exit (via ControlEvent::Exit)
  from unexpected exit (stdout EOF, parse error). On unexpected exit
  it invokes signal_master_died, but only when closing was not
  initiated by us (avoids spurious MasterDied on graceful close).
- Heartbeat path routes its existing MasterDied emits through the
  same helper so the stdin is also closed there, making subsequent
  send_command calls fail immediately with "stdin closed" instead
  of waiting RESPONSE_TIMEOUT.

Inspired by warpdotdev#11217 (SSH disconnect detection in tmux
parser) and warpdotdev#11482 (fail-fast pattern for remote_server executor),
but minimal: leverages the existing Arc<Mutex<Option<ChildStdin>>>
indirection instead of adopting RwLock<Option<Arc<...>>>.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant