Skip to content

fix: relay reconnection, stale subscriptions, and UAT testnet switch#902

Open
dangershony wants to merge 3 commits into
mainfrom
fix/relay-reconnect-and-testnet-switch
Open

fix: relay reconnection, stale subscriptions, and UAT testnet switch#902
dangershony wants to merge 3 commits into
mainfrom
fix/relay-reconnect-and-testnet-switch

Conversation

@dangershony

Copy link
Copy Markdown
Member

Problem

Three related issues causing UAT test failures:

1. Relay messages lost on refresh

When a Nostr relay disconnects, the app never reconnects (ReconnectTimeout was null). Subsequent refresh calls send REQ messages to dead WebSocket connections that silently fail. Additionally, repeated refresh calls hit stale subscription state:

  • MonitoringEoseReceivedOnSubscription used TryAdd which returned false on repeat calls, keeping stale relay tracking sets
  • TryAddEoseAction used TryAdd which lost the new callback, so EOSE responses triggered the old (already-returned) callback
  • Event stream handlers captured closures from previous calls, routing events to stale message lists

This caused investor processes to never see founder approval signatures until restart.

2. Automation server thread safety

  • ConfirmNetworkSwitchAsync accessed UI-bound properties from a background thread (crash)
  • ConfirmWipeData is async void, the wipe endpoint returned before wallets were actually cleared
  • SeedGroups read before pending RebuildSeedGroups UI jobs had flushed

3. UAT tests assume Angornet

Default network changed to Mainnet. After WipeDataAsync, the persisted network resets to Mainnet. Tests that create wallets without switching to Angornet first hit the mainnet indexer and fail silently.

Fixes

Relay layer (Angor.Shared)

  • NostrCommunicationFactory: Set ReconnectTimeout to 30s so dead relays auto-reconnect
  • NostrCommunicationFactory: MonitoringEoseReceivedOnSubscription now replaces stale relay tracking sets
  • RelaySubscriptionsHandling: TryAddEoseAction now replaces the callback on repeat calls
  • RelaySubscriptionsHandling: Added DisposeLocalSubscription, tears down local event handler without sending Nostr CLOSE
  • SignService: LookupAllInvestmentMessagesAsync disposes and recreates event handlers on each call

Automation layer (App)

  • AutomationServer: Dispatch ConfirmNetworkSwitchAsync to UI thread
  • AutomationServer: Poll-wait for WalletContext.Wallets to clear after wipe
  • AutomationFlows: Flush UI jobs before reading SeedGroups after wallet creation

UAT tests

  • Added SwitchNetworkAsync Angornet after every wipe call across all 7 test files

Test Results

Test Before After
CreateProjectTest Pass Pass
MultiFundClaimAndRecoverTest Hung Pass (7 min)
SendFundsTest Failed Pass (6 min)
WalletRecoveryTest Hung Pass (49s)
WipeDataRecoveryTest Failed Pass (63s)
MultiInvestClaimAndRecoverTest Hung Progresses fully, transient indexer lag at claim step

- Enable auto-reconnect on relay disconnect (ReconnectTimeout = 30s)
- Replace stale EOSE tracking sets on repeated refresh calls
- Dispose and recreate local event handlers on each refresh so the
  current callback receives events instead of a stale closure
- Add DisposeLocalSubscription to tear down the local handler without
  sending a Nostr CLOSE (the same subscription key is re-used immediately)
- Dispatch ConfirmNetworkSwitchAsync to UI thread (thread affinity)
- Poll-wait for WalletContext to clear after wipe (async void completion)
- Flush UI jobs before reading SeedGroups after wallet creation
- Improve diagnostic error message for wallet ID lookup failures
Default network is now Mainnet. All UAT tests must switch to Angornet
after WipeDataAsync/WipeDataWithRecoveryPurgeAsync before creating
wallets, otherwise wallet creation hits the mainnet indexer and fails.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant