Browse Source

multifd: Fix hang if send thread errors during sync

When a send thread encounters an error (as is the case with yank),
it sets multifd_send_state->exiting and the other threads exit too.
This races with multifd_send_sync_main() which now hangs at
qemu_sem_wait(&p->sem_sync) in multifd_send_sync_main() line 647
as it waits for threads that have exited.

Fix this by kicking the semaphores when exiting the send threads.

I encountered this hang when stress testing the colo unit test,
though I was unable to write a migration test to reliably hit this.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260302-colo_unit_test_multifd-v11-18-d653fb3b1d80@web.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
master
Lukas Straub 4 weeks ago
committed by Fabiano Rosas
parent
commit
37c74b1dbb
  1. 7
      migration/multifd.c

7
migration/multifd.c

@ -772,9 +772,14 @@ out:
assert(local_err); assert(local_err);
trace_multifd_send_error(p->id); trace_multifd_send_error(p->id);
multifd_send_error_propagate(local_err); multifd_send_error_propagate(local_err);
multifd_send_kick_main(p);
} }
/*
* Always kick the main thread: The main thread might wait on this thread
* while another thread encounters an error and signals this thread to exit.
*/
multifd_send_kick_main(p);
rcu_unregister_thread(); rcu_unregister_thread();
trace_multifd_send_thread_end(p->id, p->packets_sent); trace_multifd_send_thread_end(p->id, p->packets_sent);

Loading…
Cancel
Save