Skip to content

Keep unauthenticated WebSocket alive during daemon receive loop#2020

Open
smeinecke wants to merge 4 commits intoAsamK:masterfrom
smeinecke:fix/unauthenticated-socket-keepalive
Open

Keep unauthenticated WebSocket alive during daemon receive loop#2020
smeinecke wants to merge 4 commits intoAsamK:masterfrom
smeinecke:fix/unauthenticated-socket-keepalive

Conversation

@smeinecke
Copy link
Copy Markdown
Contributor

Related to #2018 and #2019.

After #2019 fixes sender key re-distribution, a ~6s delay per group message remains because the unauthenticated (sealed sender) WebSocket connection is torn down ~10s after each send and re-established from scratch for the next message.

Root cause

SignalWebSocket.DelayedDisconnectThread disconnects when keepAliveTokens is empty after disconnectTimeout. The authenticated socket avoids this because ReceiveHelper registers a "receive" keep-alive token for the lifetime of the receive loop. The unauthenticated socket had no token registered, so every send started a 10s countdown to disconnect:

[unidentified:1636243480] Connected      ← send completes
[unidentified:1636243480] Disconnecting  ← 10s later (from last request initiation)
[unidentified:255738755]  Connecting     ← next message, new ~6s wait
[unidentified:255738755]  Connected

Fix

Register a "receive" keep-alive token on the unauthenticated socket alongside the authenticated one in ReceiveHelper, and remove it in the same finally block. This aborts the DelayedDisconnectThread and keeps the unidentified connection alive for the lifetime of the daemon receive loop — matching the behaviour of Signal mobile clients.

Result

After the fix, the unidentified connection is established once and reused across all subsequent group message sends with no reconnection delay.

The unauthenticated (sealed sender) socket had no keep-alive token
registered, causing SignalWebSocket's DelayedDisconnectThread to tear
down the connection ~10s after each send. Every subsequent group message
then had to re-establish a fresh TLS connection (~6s delay).

The authenticated socket avoids this by registering a "receive" keep-alive
token for the lifetime of the receive loop. Apply the same pattern to the
unauthenticated socket: register the token alongside the authenticated one
and remove it in the same finally block.

This keeps the unidentified connection alive in daemon mode, matching the
behaviour of Signal mobile clients.
@AsamK
Copy link
Copy Markdown
Owner

AsamK commented Apr 15, 2026

Thanks, I already increased the timeout to 30s tto match the current android setting, that should already help a bit.
Establishing the unidentified connection taking 6s is very long, it takes more like half a second for me. Do you have some kind of special network setup that could cause this?
The keep alive token wasn't set for the unidentified connection because it's not used for receiving messages. In Signal-Android the keep alive is set during calls (which could be added to signal-cli) and when the app is in the foreground (a concept that doesn't really exist in signal-cli). Maybe in signal-cli the keep alive should be set while a jsonrpc tcp/socket connection is open? Then keep alive would also work for a signal-cli daemon started with --receive-mode=manual

@smeinecke
Copy link
Copy Markdown
Contributor Author

I've found the problem - it's IPv6 related. As soon as I disabled IPv6 in the container the 6s delay disappeared.

The setup is a Podman container on a host that has IPv6 connectivity. The container's internal network is IPv4-only (10.89.x.x/24), but Podman doesn't fully isolate IPv6 - the container still gets a link-local IPv6 address and can resolve AAAA records.

This triggers Happy Eyeballs: libsignal races IPv4 and IPv6 connections, but IPv6 packets are silently dropped somewhere in the NAT/routing path rather than immediately refused, causing a ~6s stall before the IPv4 attempt wins.

The fix on my end was adding --sysctl net.ipv6.conf.all.disable_ipv6=1 to the container, which forces IPv4-only and eliminates the delay entirely.

The increased 30s timeout will definitely help as a safety net. If it's not too much trouble, it might also be worth considering whether libsignal/signal-cli could detect and report a slow Happy Eyeballs fallback (e.g. log a warning if the winning connection family differed from the first attempt), which would make this class of issue easier to diagnose for others (I did not check if thats possible yet).

Keep the unidentified socket alive while a JSON-RPC connection is open (including stdio mode) and while a D-Bus object is exported, instead of for the lifetime of the receive loop. The receive loop does not use the unauthenticated socket, so keeping it alive there was semantically wrong.

This also covers --receive-mode=manual, where no receive loop runs butclients still send messages.
@smeinecke
Copy link
Copy Markdown
Contributor Author

I reworked the approach as suggested:

The keep-alive token on the unauthenticated socket is now registered when a JSON-RPC client connection opens and removed when it closes (single-account and multi-account mode, including stdio). For D-Bus mode (as I use this for my implementation) the token is registered in DbusSignalImpl.initObjects() and removed in close(), covering the lifetime of the exported D-Bus object. This also applies to --receive-mode=manual since the keep-alive is no longer tied to the receive loop at all.

ReceiveHelper no longer registers a keep-alive token on the unauthenticated socket - as you noted, it isn't used for receiving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants