Fix uring teardown#143
Conversation
Signed-off-by: Horst Birthelmer <[email protected]> (cherry picked from commit ad21e5a)
Fix uninterruptible sleep (D state) hangs during FUSE filesystem teardown when using io_uring. The issue manifests as processes stuck waiting for requests that are never completed, particularly affecting force requests like FUSE_FLUSH or when requests are created after fuse_abort_conn() already finished. If on daemon exit io_uring_try_cancel_requests() runs and calls fuse_uring_cancel() which will teardown the entries by calling fuse_uring_entry_teardown() before fuse_abort_conn() then we end up in fuse_uring_abort with queue_refs == 0 and the queues are never stopped. If the queues are stopped all new requests will be rejected, but that does not happen, so all new calls are stuck. Signed-off-by: Horst Birthelmer <[email protected]> (cherry picked from commit 9550b4d)
|
@hbirth I need to look tomorrow into the 9.4 branch, because 9.4 does not have this F_CANCEL and I had added a workaround into that kernel version therefore. Need to verify it for races (we have a jira for a related issue on 9.4). |
|
So the case where queue_refs == 0 can not occur? I have to check, too, then. |
|
To me this looks like a slightly better case for the problem at hand. |
|
@hbirth I had to look up what I had done (commit 14ba960). Issue with it are on flight registration SQEs when either way of fuse_abort_conn() is called - these registration SQEs will not be released. There would be a way to handle it through locks and another ref count, but given that 9.4 is deprecated we better leave it in the current state. |
cherry pick from redfs-ubuntu-noble