Browse Source
When migration connection is broken, the QEMU and libvirtd(8) process on the source side receive TCP connection reset notification. QEMU sets the migration status to FAILED and proceeds to migration_cleanup(). Meanwhile, Libvirtd(8) sends a QMP command to migrate_set_capabilities(). The migration_cleanup() and qmp_migrate_set_capabilities() calls race with each other. When the latter is invoked first, since the migration is not running (FAILED), migration capabilities are reset to false, so during migration_cleanup() the QEMU process crashes with assertion failure. Introduce a new migration status FAILING and use it as an interim status when an error occurs. Once migration_cleanup() is done, it sets the migration status to FAILED. This helps to avoid the above race condition and ensuing failure. Interim status FAILING is set wherever the execution moves towards migration_cleanup(): - postcopy_start() - migration_thread() - migration_cleanup() - multifd_send_setup() - bg_migration_thread() - migration_completion() - migration_detect_error() - bg_migration_completion() - multifd_send_error_propagate() - migration_connect_error_propagate() The migration status finally moves to FAILED and reports an appropriate error to the user. Interim status FAILING is _NOT_ set in the following routines because they do not follow the migration_cleanup() path to the FAILED state: - cpr_exec_cb() - qemu_savevm_state() - postcopy_listen_thread() - process_incoming_migration_co() - multifd_recv_terminate_threads() - migration_channel_process_incoming() Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Link: https://lore.kernel.org/qemu-devel/20260224102547.226087-1-ppandit@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>master
committed by
Fabiano Rosas
5 changed files with 32 additions and 19 deletions
Loading…
Reference in new issue