Skip to content

Fix uring teardown#143

Merged
hbirth merged 2 commits intoDDNStorage:redfs-rhel9_4-427.42.1from
hbirth:redfs-rhel9_4-427.42.1
Apr 13, 2026
Merged

Fix uring teardown#143
hbirth merged 2 commits intoDDNStorage:redfs-rhel9_4-427.42.1from
hbirth:redfs-rhel9_4-427.42.1

Conversation

@hbirth
Copy link
Copy Markdown
Collaborator

@hbirth hbirth commented Apr 10, 2026

cherry pick from redfs-ubuntu-noble

hbirth added 2 commits April 10, 2026 09:38
Signed-off-by: Horst Birthelmer <[email protected]>
(cherry picked from commit ad21e5a)
Fix uninterruptible sleep (D state) hangs during FUSE filesystem
teardown when using io_uring. The issue manifests as processes stuck
waiting for requests that are never completed, particularly affecting
force requests like FUSE_FLUSH or when requests are created after
fuse_abort_conn() already finished.

If on daemon exit
io_uring_try_cancel_requests() runs and  calls fuse_uring_cancel()
which will teardown the entries by calling fuse_uring_entry_teardown()
before fuse_abort_conn() then we end up in fuse_uring_abort with
queue_refs == 0 and the queues are never stopped.

If the queues are stopped all new requests will be rejected, but
that does not happen, so all new calls are stuck.

Signed-off-by: Horst Birthelmer <[email protected]>
(cherry picked from commit 9550b4d)
@hbirth hbirth requested a review from bsbernd April 10, 2026 09:20
@bsbernd
Copy link
Copy Markdown
Collaborator

bsbernd commented Apr 10, 2026

@hbirth I need to look tomorrow into the 9.4 branch, because 9.4 does not have this F_CANCEL and I had added a workaround into that kernel version therefore. Need to verify it for races (we have a jira for a related issue on 9.4).

@hbirth
Copy link
Copy Markdown
Collaborator Author

hbirth commented Apr 10, 2026

So the case where queue_refs == 0 can not occur? I have to check, too, then.

@hbirth
Copy link
Copy Markdown
Collaborator Author

hbirth commented Apr 11, 2026

To me this looks like a slightly better case for the problem at hand.
if IO_URING_F_CANCEL is not defined we never call fuse_uring_cancel() so the queue_refs never go to 0.
This part of the fix is technically not really needed, since the case will not occur, but will not do anything wrong either.

@bsbernd
Copy link
Copy Markdown
Collaborator

bsbernd commented Apr 11, 2026

@hbirth I had to look up what I had done (commit 14ba960). Issue with it are on flight registration SQEs when either way of fuse_abort_conn() is called - these registration SQEs will not be released. There would be a way to handle it through locks and another ref count, but given that 9.4 is deprecated we better leave it in the current state.

@hbirth hbirth merged commit d6048c3 into DDNStorage:redfs-rhel9_4-427.42.1 Apr 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants