queue tuning#326
Merged
Merged
Conversation
The queue of pending states is stored in per-thread linked-lists of
“queue nodes” that are each a 4K page worth of state pointers:
┌A─────────┬──────┐ ┌B─────────┬──────┐
0: │ s[…] │ next ├─►│ s[…] │ next │
└──────────┴──────┘ └──────────┴──────┘
┌C─────────┬──────┐
1: │ s[…] │ next │
└──────────┴──────┘
┌D─────────┬──────┐ ┌E─────────┬──────┐ ┌F─────────┬──────┐
2: │ s[…] │ next ├─►│ s[…] │ next ├─►│ s[…] │ next │
└──────────┴──────┘ └──────────┴──────┘ └──────────┴──────┘
…
When the head of a thread’s queue reaches the `next` pointer in the head
queue node, the next dequeue (1) pops the entire queue node and then (2)
retries dequeue. E.g. pop of thread 2’s head queue node:
┌A─────────┬──────┐ ┌B─────────┬──────┐
0: │ s[…] │ next ├─►│ s[…] │ next │
└──────────┴──────┘ └──────────┴──────┘
┌C─────────┬──────┐
1: │ s[…] │ next │
└──────────┴──────┘
┌E─────────┬──────┐ ┌F─────────┬──────┐
2: │ s[…] │ next ├─►│ s[…] │ next │
└──────────┴──────┘ └──────────┴──────┘
…
This pop is done through a CAS into the queue end pointers. Following
this, `ends` was unconditionally set to the old value. However in the
success case the most up to date value we know of is `new`.
The effect of this was that successfully popping a queue node always did
an unnecessary extra loop:
1. CAS into the queue end pointers
2. `ends = old`
3. retry
4. find `ends != ends_check` ┐
5. re-read queue end pointers ├─ unnecessary extra work
6. retry ┘
Surprisingly this suboptimal logic seems to have been present unnoticed
since the first implementation of this feature in
2153f1f. This was only discovered
during debugging an orthogonal optimisation to queue management in
development.
This change appears to have no significant effect on some representative
models. So apparently this extra work was not a significant overhead.
In some speculative work, I had an initial misunderstanding that the pointer stored in the queue tail was the next slot to push to. In reality it is the slot _before_ the next slot to push to. For this reason, a pointer to `queue_node.next` can never be stored to the tail. Some debugging assertions around this would have saved me a lot of time, so lets add them.
In some speculative work, I had a misunderstanding that led to
constructing a malformed queue. The queue involved a single queue node
but with the head in advance of the tail:
┌──────┬──────┐
│ head │ tail │
└───┬──┴───┬──┘
└──────┼───┐
└─┐ │
▼ ▼
┌──────────┬──────┐
│ s[…] │ next ├─►null
└──────────┴──────┘
This situation cannot occur in the code as it is at the present point.
And the problem with this situation is that the logic for detecting when
the queue is exhausted no longer works. `head` only ever advances and
will never become equal to `tail`.
This commit introduces some stronger assertions that hopefully would
catch mistakes like this.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.