fix(validator): serialize hotkey contract writes to prevent nonce collisions#465
Closed
jaso0n0818 wants to merge 1 commit into
Closed
Conversation
…lisions The validator signs contract writes with the same hotkey over two separate substrate connections: the forward loop (contract_client / self.subtensor) and the axon handlers (axon_contract_client / axon_subtensor). Both call create_signed_extrinsic which auto-fetches the nonce via AccountNonceApi — the best-block nonce, which does not count pending pool txs. When both clients race within the same block window they fetch the same nonce N, one tx lands and the other is banned (1012 Transaction is temporarily banned), starving the forward loop and causing delivered swaps to be slashed. Add an optional write_lock parameter to AllwaysContractClient.__init__. exec_contract_raw acquires the lock across nonce-fetch + submit + inclusion, so the best-block nonce is guaranteed to advance before the sibling client composes its next extrinsic. The pre-flight balance read is intentionally left outside the lock so reads remain parallel. In neurons/validator.py, create one threading.Lock as self._write_lock and pass it to both contract_client and axon_contract_client at construction. Lock ordering: axon_lock -> write_lock (axon handlers) and write_lock -> substrate_lock (forward loop). No path takes write_lock -> axon_lock, so no deadlock cycle. Backward compat: write_lock defaults to None; omitting it produces a _NullContext no-op so existing call sites and tests are unaffected. New tests/test_write_lock_serialization.py (6 tests) verifies wiring (shared lock stored on both clients), lock held during submit, no error without write_lock, and balance read fires before write_lock is acquired. Closes entrius#457
Author
|
Rebased on latest |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The validator signs contract writes with the same hotkey over two separate substrate connections: the forward loop (
contract_clientonself.subtensor) and the axon handlers (axon_contract_clientonaxon_subtensor).Both call
create_signed_extrinsic, which auto-fetches the nonce viaAccountNonceApi.account_nonce— the best-block nonce, which does not count pending pool txs. Within the same block window both clients fetch nonceN; one tx lands and the other is banned (error 1012 Transaction is temporarily banned), starving the forward loop and causing delivered swaps to be slashed. Closes #457.Fix
allways/contract_client.pyAdd an optional
write_lock: threading.Lockparameter toAllwaysContractClient.__init__. Insideexec_contract_raw, hold the lock across nonce-fetch + submit + inclusion. The pre-flight balance read is intentionally left outside the lock so reads remain parallel.A
_NullContextno-op is used when no write_lock is passed (backward-compatible).neurons/validator.pyCreate one shared
self._write_lock = threading.Lock()and pass it to bothcontract_clientandaxon_contract_clientat construction. Lock order:axon_lock → write_lock(axon writes) andwrite_lock → substrate_lock(forward writes) — no cycle.Tests
New
tests/test_write_lock_serialization.py(6 tests):write_lockcompletesexec_contract_rawnormallywrite_lockis held whensubstrate_call(submit_extrinsic)runssubstrate_callfor account info) fires before the write_lock is acquiredCloses #457