fix(validator): serialize hotkey contract writes to prevent nonce collisions by anderdc · Pull Request #457 · entrius/allways

anderdc · 2026-06-08T22:25:39Z

What

Share one write lock across the forward-loop contract client and the axon-handler contract client, held from nonce-fetch through inclusion, so the validator hotkey's nonce sequence can't collide.

Why

The validator signs contract writes with the same hotkey over two separate substrate connections — the forward loop (self.contract_client on self.subtensor: confirm/timeout/extensions) and the axon handlers (self.axon_contract_client on self.axon_subtensor: vote_reserve/vote_activate).

create_signed_extrinsic auto-fetches the nonce via AccountNonceApi.account_nonce — the best-block nonce, which doesn't count pending pool txs. Within a block window every fetch returns the same number. With two uncoordinated signers on one account, both can grab nonce N; one lands, the other is rejected and the tx-pool bans it (1012 Transaction is temporarily banned).

During a halt this became constant: the axon side floods vote_reserve attempts, which contended/advanced the nonce and starved the forward loop's confirm_swap votes — delivered swaps blew past timeout_block and got slashed (e.g. swaps 3728/3729). This contention is not halt-specific; sustained reserve volume can reproduce it.

A dedicated subtensor node does not fix this — the node returns the same best-block nonce to both connections; it detects the duplicate, it doesn't prevent it. The fix is client-side coordination.

How

AllwaysContractClient takes an optional shared write_lock; exec_contract_raw holds it across nonce-fetch → submit → (wait_for_inclusion) so the best-block nonce advances before the next signer composes.
neurons/validator.py creates one write_lock and passes it to both clients.
Reads stay on their per-connection recv locks, so they remain parallel — no change to axon responsiveness.

Lock order is axon_lock → write_lock (axon writes) and write_lock → main lock (forward writes); no path takes write_lock → axon_lock, so no cycle.

Scope / tradeoff

Small, contained safeguard — core swap logic untouched, only the low-level submit path gains a lock. Writes now serialize across connections (~1 write/block with wait-for-inclusion), which is far above current load. The planned force_batch single-writer flush is the throughput follow-up; this lands first as the safe guard.

Tests

TestWriteLockSerialization: shared vs private lock wiring, and that the lock is held during submit but not during the account read (reads stay parallel).
Full suite: 692 passed; ruff check + ruff format clean.

…lisions The forward loop and axon handlers both sign contract writes with the validator hotkey over separate substrate connections. Each auto-fetches the best-block nonce independently, so two concurrent writes can grab the same nonce; one lands and the other is rejected and pool-banned (1012). A reserve flood (e.g. during a halt) made this constant and starved confirm/timeout votes until swaps blew past their deadline. Share one write lock across both clients, held across nonce-fetch -> submit -> inclusion so the nonce advances before the next signer composes. Reads stay on their per-connection locks and remain parallel.

…lisions The validator signs contract writes with the same hotkey over two separate substrate connections: the forward loop (contract_client / self.subtensor) and the axon handlers (axon_contract_client / axon_subtensor). Both call create_signed_extrinsic which auto-fetches the nonce via AccountNonceApi — the best-block nonce, which does not count pending pool txs. When both clients race within the same block window they fetch the same nonce N, one tx lands and the other is banned (1012 Transaction is temporarily banned), starving the forward loop and causing delivered swaps to be slashed. Add an optional write_lock parameter to AllwaysContractClient.__init__. exec_contract_raw acquires the lock across nonce-fetch + submit + inclusion, so the best-block nonce is guaranteed to advance before the sibling client composes its next extrinsic. The pre-flight balance read is intentionally left outside the lock so reads remain parallel. In neurons/validator.py, create one threading.Lock as self._write_lock and pass it to both contract_client and axon_contract_client at construction. Lock ordering: axon_lock -> write_lock (axon handlers) and write_lock -> substrate_lock (forward loop). No path takes write_lock -> axon_lock, so no deadlock cycle. Backward compat: write_lock defaults to None; omitting it produces a _NullContext no-op so existing call sites and tests are unaffected. New tests/test_write_lock_serialization.py (6 tests) verifies wiring (shared lock stored on both clients), lock held during submit, no error without write_lock, and balance read fires before write_lock is acquired. Closes entrius#457

anderdc mentioned this pull request Jun 8, 2026

fix(validator): fast-reject reservations while halted #458

Merged

jaso0n0818 mentioned this pull request Jun 9, 2026

fix(validator): serialize hotkey contract writes to prevent nonce collisions #465

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(validator): serialize hotkey contract writes to prevent nonce collisions#457

fix(validator): serialize hotkey contract writes to prevent nonce collisions#457
anderdc wants to merge 1 commit into
testfrom
fix/serialize-contract-writes-nonce

anderdc commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anderdc commented Jun 8, 2026

What

Why

How

Scope / tradeoff

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant