queue tuning by Smattr · Pull Request #326 · Smattr/rumur

Smattr · 2026-05-16T21:42:51Z

No description provided.

The queue of pending states is stored in per-thread linked-lists of “queue nodes” that are each a 4K page worth of state pointers: ┌A─────────┬──────┐ ┌B─────────┬──────┐ 0: │ s[…] │ next ├─►│ s[…] │ next │ └──────────┴──────┘ └──────────┴──────┘ ┌C─────────┬──────┐ 1: │ s[…] │ next │ └──────────┴──────┘ ┌D─────────┬──────┐ ┌E─────────┬──────┐ ┌F─────────┬──────┐ 2: │ s[…] │ next ├─►│ s[…] │ next ├─►│ s[…] │ next │ └──────────┴──────┘ └──────────┴──────┘ └──────────┴──────┘ … When the head of a thread’s queue reaches the `next` pointer in the head queue node, the next dequeue (1) pops the entire queue node and then (2) retries dequeue. E.g. pop of thread 2’s head queue node: ┌A─────────┬──────┐ ┌B─────────┬──────┐ 0: │ s[…] │ next ├─►│ s[…] │ next │ └──────────┴──────┘ └──────────┴──────┘ ┌C─────────┬──────┐ 1: │ s[…] │ next │ └──────────┴──────┘ ┌E─────────┬──────┐ ┌F─────────┬──────┐ 2: │ s[…] │ next ├─►│ s[…] │ next │ └──────────┴──────┘ └──────────┴──────┘ … This pop is done through a CAS into the queue end pointers. Following this, `ends` was unconditionally set to the old value. However in the success case the most up to date value we know of is `new`. The effect of this was that successfully popping a queue node always did an unnecessary extra loop: 1. CAS into the queue end pointers 2. `ends = old` 3. retry 4. find `ends != ends_check` ┐ 5. re-read queue end pointers ├─ unnecessary extra work 6. retry ┘ Surprisingly this suboptimal logic seems to have been present unnoticed since the first implementation of this feature in 2153f1f. This was only discovered during debugging an orthogonal optimisation to queue management in development. This change appears to have no significant effect on some representative models. So apparently this extra work was not a significant overhead.

In some speculative work, I had an initial misunderstanding that the pointer stored in the queue tail was the next slot to push to. In reality it is the slot _before_ the next slot to push to. For this reason, a pointer to `queue_node.next` can never be stored to the tail. Some debugging assertions around this would have saved me a lot of time, so lets add them.

In some speculative work, I had a misunderstanding that led to constructing a malformed queue. The queue involved a single queue node but with the head in advance of the tail: ┌──────┬──────┐ │ head │ tail │ └───┬──┴───┬──┘ └──────┼───┐ └─┐ │ ▼ ▼ ┌──────────┬──────┐ │ s[…] │ next ├─►null └──────────┴──────┘ This situation cannot occur in the code as it is at the present point. And the problem with this situation is that the logic for detecting when the queue is exhausted no longer works. `head` only ever advances and will never become equal to `tail`. This commit introduces some stronger assertions that hopefully would catch mistakes like this.

Smattr added 3 commits May 16, 2026 14:40

Smattr merged commit 1253711 into main May 16, 2026
22 checks passed

Smattr deleted the smattr/a1c1734b-2ac0-4b47-bedc-4502c5b03222 branch May 16, 2026 23:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

queue tuning#326

queue tuning#326
Smattr merged 3 commits into
mainfrom
smattr/a1c1734b-2ac0-4b47-bedc-4502c5b03222

Smattr commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Smattr commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant