From 1d6610099bd7fc159626a38e60a3c84343ff67f7 Mon Sep 17 00:00:00 2001 From: Hanna Czenczek Date: Fri, 2 Jan 2026 16:32:46 +0100 Subject: [PATCH] block/nfs: Do not enter coroutine from CB MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The reasoning I gave for why it would be safe to call aio_co_wake() despite holding the mutex was wrong: It is true that the current request will not re-acquire the mutex, but a subsequent request in the same coroutine can. Because the mutex is a non-coroutine mutex, this will result in a deadlock. Therefore, we must either not enter the coroutine here (only scheduling it), or release the mutex around aio_co_wake(). I opt for the former, as it is the behavior prior to the offending commit, and so seems safe to do. Fixes: deb35c129b859b9bec70fd42f856a0b7c1dc6e61 ("nfs: Run co BH CB in the coroutine’s AioContext") Buglink: https://gitlab.com/qemu-project/qemu/-/issues/2622#note_2965097035 Cc: qemu-stable@nongnu.org Signed-off-by: Hanna Czenczek Message-ID: <20260102153246.154207-1-hreitz@redhat.com> Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf --- block/nfs.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/block/nfs.c b/block/nfs.c index 1d3a34a30c..b78f4f86e8 100644 --- a/block/nfs.c +++ b/block/nfs.c @@ -249,14 +249,15 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, void *data, } /* - * Safe to call: nfs_service(), which called us, is only run from the FD - * handlers, never from the request coroutine. The request coroutine in - * turn will yield unconditionally. - * No need to release the lock, even if we directly enter the coroutine, as - * the lock is never re-taken after yielding. (Note: If we do enter the - * coroutine, @task will probably be dangling once aio_co_wake() returns.) + * Using aio_co_wake() here could re-enter the coroutine directly, while we + * still hold the mutex. The current request will not attempt to re-take + * the mutex, so that is fine; but if the same coroutine then goes on to + * submit another request, that new request will try to re-take the mutex, + * resulting in a deadlock. + * To prevent that, only schedule the coroutine so it will be entered later, + * with the mutex released. */ - aio_co_wake(task->co); + aio_co_schedule(qemu_coroutine_get_aio_context(task->co), task->co); } static int coroutine_fn nfs_co_preadv(BlockDriverState *bs, int64_t offset, @@ -716,8 +717,8 @@ nfs_get_allocated_file_size_cb(int ret, struct nfs_context *nfs, void *data, if (task->ret < 0) { error_report("NFS Error: %s", nfs_get_error(nfs)); } - /* Safe to call, see nfs_co_generic_cb() */ - aio_co_wake(task->co); + /* Must not use aio_co_wake(), see nfs_co_generic_cb() */ + aio_co_schedule(qemu_coroutine_get_aio_context(task->co), task->co); } static int64_t coroutine_fn nfs_co_get_allocated_file_size(BlockDriverState *bs)