riscv/qemu - qemu - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Lukas Straub	161e603975	colo: Reuse the return path from migration on primary and secondary side Use the return-path capability with colo and reuse the opened return path file on both primary and secondary side. This fixes a crash in colo where migration_cancel() races with colo closing s->rp_state.from_dst_file. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/qemu-devel/20260302-colo_unit_test_multifd-v11-21-d653fb3b1d80@web.de Signed-off-by: Fabiano Rosas <farosas@suse.de>	4 weeks ago
Lukas Straub	b2cbd3d0f9	colo: Use file lock in primary_vm_do_failover() Take the file lock since s->to_dst_file and s->rp_state.from_dst_file may be changed in the migration thread. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/qemu-devel/20260302-colo_unit_test_multifd-v11-19-d653fb3b1d80@web.de Signed-off-by: Fabiano Rosas <farosas@suse.de>	4 weeks ago
Lukas Straub	9d13dd0a50	colo: Do not hold the BQL while receiving ram state. We only receive ram into the colo cache here and don't touch anything else, so the BQL is not needed here. Move cpu_synchronize_all_states() downwards, before we apply the received checkpoint. It turns out that qemu_system_reset() already calls it for us. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Lukas Straub <lukasstraub2@web.de> Link: https://lore.kernel.org/qemu-devel/20260302-colo_unit_test_multifd-v11-12-d653fb3b1d80@web.de Signed-off-by: Fabiano Rosas <farosas@suse.de>	4 weeks ago
Lukas Straub	604492a991	colo: Hold the BQL while sending ram state qemu_savevm_state_complete_precopy() requires that BQL is held. This fixes a crash when running with TCG accel. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Lukas Straub <lukasstraub2@web.de> Link: https://lore.kernel.org/qemu-devel/20260302-colo_unit_test_multifd-v11-11-d653fb3b1d80@web.de Signed-off-by: Fabiano Rosas <farosas@suse.de>	4 weeks ago
Lukas Straub	32603ba372	colo: Fix crash during device vmstate load With colo we load device vmstate during each checkpoint, on top of a vm that was already running. Some devices expect a reset before loading vmstate on such a previously running vm. This fixes a crash when using COLO with Q35 machine. The reset adds 10-20ms overhead to the checkpointing proces in my testing. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Lukas Straub <lukasstraub2@web.de> Link: https://lore.kernel.org/qemu-devel/20260302-colo_unit_test_multifd-v11-10-d653fb3b1d80@web.de Signed-off-by: Fabiano Rosas <farosas@suse.de>	4 weeks ago
Lukas Straub	64df66fe8c	Call colo_release_ram_cache() after multifd threads terminate The multifd threads still may access the colo cache, so release it only after they terminate. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Lukas Straub <lukasstraub2@web.de> Link: https://lore.kernel.org/qemu-devel/20260302-colo_unit_test_multifd-v11-9-d653fb3b1d80@web.de Signed-off-by: Fabiano Rosas <farosas@suse.de>	4 weeks ago
Lukas Straub	3639d94e16	colo: Replace migration_incoming_colo_enabled() with migrate_colo() Since `121ccedc2b` migration: block incoming colo when capability is disabled x-colo capability needs to be always enabled on the incoming side. So migration_incoming_colo_enabled() and migrate_colo() are equivalent with migrate_colo() being easier to reason about since it is always true during the whole migration. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Lukas Straub <lukasstraub2@web.de> Link: https://lore.kernel.org/qemu-devel/20260302-colo_unit_test_multifd-v11-4-d653fb3b1d80@web.de Signed-off-by: Fabiano Rosas <farosas@suse.de>	4 weeks ago
Peter Xu	32ed27e5b9	migration/colo/xen: Use generic helpers in qemu_save_device_state() Use qemu_savevm_state_non_iterable*() helpers for saving device states, rather than walking the vmstate handlers on its own. Non-iterables can be either early_setup devices, or otherwise. Note that QEMU only has one early_setup device currently, which is virtio-mem, and I highly doubt if it is used in either COLO or Xen users.. However this step is still better needed to provide full coverage of all non-iterable vmstates. When at it, allow it to report errors. Cc: David Woodhouse <dwmw2@infradead.org> Cc: Paul Durrant <paul@xen.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-25-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2 months ago
Peter Xu	c3bc01feb8	migration: Introduce qemu_savevm_state_end() Introduce a helper to end a migration stream. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-15-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2 months ago
Peter Xu	2602618b3f	migration/colo: Send device states without copying buffer We can safely use the async version of put buffer here because the qemufile will be flushed right away. Suggested-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-13-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2 months ago
Peter Xu	8c4d280054	migration/colo: Document qemu_fflush(fb) COLO caches all device states in a buffer channel `fb'. Add some comments explaining the flush, that (1) it's the `fb' not the main channel, (2) on what it updates. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-10-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2 months ago
Peter Xu	47377604dc	migration/colo: Use the RAM iterable helper directly qemu_savevm_state_complete_precopy() has a weird parameter called "iterable_only". It's needed because COLO saves device states in advance. To make dropping that weird parameter easier, let COLO directly use the RAM iterator helper instead, which should make the code easier to read too. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Lukas Straub <lukasstraub2@web.de> Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-9-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2 months ago
Peter Xu	93b974cbc1	migration: Remove call to send switchover start event in colo/savevm COLO (in case of periodically checkpointing) already have switchover happened before hand. This switchover_start feature never applies to COLO. Savevm for snapshot doesn't have switchover phase and VM is stopped for the whole process. Remove both. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-7-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2 months ago
Peter Xu	ac4702a746	migration/colo: Unwrap qemu_savevm_live_state() It's only used in COLO path and only contains two calls. Unwrap the function. It paves way for further reduce special COLO paths on sync. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Lukas Straub <lukasstraub2@web.de> Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-6-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2 months ago
Arun Menon	3f9d6e77b0	migration: Update qemu_file_get_return_path() docs and remove dead checks The documentation of qemu_file_get_return_path() states that it can return NULL on failure. However, a review of the current implementation reveals that it is guaranteed that it will always succeed and will never return NULL. As a result, the NULL checks post calling the function become redundant. This commit updates the documentation for the function and removes all NULL checks throughout the migration code. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-12-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	6 months ago
Arun Menon	deecb4e713	migration: push Error **errp into qemu_loadvm_state_main() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_state_main() must report an error in errp, in case of failure. Set errp explicitly if it is NULL in case of failure in the out section. This will be removed in the subsequent patch when all of the calls are converted to passing errp. The error message in the default case of qemu_loadvm_state_main() has the word "savevm". This is removed because it can confuse the user while reading destination side error logs. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-9-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	6 months ago
Arun Menon	84279d5dc1	migration: push Error **errp into qemu_load_device_state() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_load_device_state() must report an error in errp, in case of failure. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-8-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	6 months ago
Vladimir Sementsov-Ogievskiy	fe6a74f365	migration: qemu_file_set_blocking(): add errp parameter qemu_file_set_blocking() is a wrapper on qio_channel_set_blocking(), so let's passthrough the errp. Note the migration should not be using &error_abort in these calls, however, this is done to expedite the API conversion. The original code would have eventually ended up calling either qemu_socket_set_nonblock which would asset on Linux, or g_unix_set_fd_nonblocking which would propagate errors. We never saw asserts in practice, and conceptually they should not happen, but ideally this code will be later adapted to remove use of &error_abort. Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	7 months ago
Akihiko Odaki	da926a8b9d	migration/colo: Replace QemuSemaphore with QemuEvent colo_exit_sem and colo_incoming_sem represent one-shot events so they can be converted into QemuEvent, which is more lightweight. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250529-event-v5-8-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	10 months ago
Maciej S. Szmigiero	4e55cb3cde	migration: Add MIG_CMD_SWITCHOVER_START and its load handler This QEMU_VM_COMMAND sub-command and its switchover_start SaveVMHandler is used to mark the switchover point in main migration stream. It can be used to inform the destination that all pre-switchover main migration stream data has been sent/received so it can start to process post-switchover data that it might have received via other migration channels like the multifd ones. Add also the relevant MigrationState bit stream compatibility property and its hw_compat entry. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Zhang Chen <zhangckid@gmail.com> # for the COLO part Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/311be6da85fc7e49a7598684d80aa631778dcbce.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	1 year ago
Peter Xu	8597af7615	migration/block: Rewrite disk activation This patch proposes a flag to maintain disk activation status globally. It mostly rewrites disk activation mgmt for QEMU, including COLO and QMP command xen_save_devices_state. Backgrounds =========== We have two problems on disk activations, one resolved, one not. Problem 1: disk activation recover (for switchover interruptions) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When migration is either cancelled or failed during switchover, especially when after the disks are inactivated, QEMU needs to remember re-activate the disks again before vm starts. It used to be done separately in two paths: one in qmp_migrate_cancel(), the other one in the failure path of migration_completion(). It used to be fixed in different commits, all over the places in QEMU. So these are the relevant changes I saw, I'm not sure if it's complete list: - In 2016, commit `fe904ea824` ("migration: regain control of images when migration fails to complete") - In 2017, commit `1d2acc3162` ("migration: re-active images while migration been canceled after inactive them") - In 2023, commit `6dab4c93ec` ("migration: Attempt disk reactivation in more failure scenarios") Now since we have a slightly better picture maybe we can unify the reactivation in a single path. One side benefit of doing so is, we can move the disk operation outside QMP command "migrate_cancel". It's possible that in the future we may want to make "migrate_cancel" be OOB-compatible, while that requires the command doesn't need BQL in the first place. This will already do that and make migrate_cancel command lightweight. Problem 2: disk invalidation on top of invalidated disks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is an unresolved bug for current QEMU. Link in "Resolves:" at the end. It turns out besides the src switchover phase (problem 1 above), QEMU also needs to remember block activation on destination. Consider two continuous migration in a row, where the VM was always paused. In that scenario, the disks are not activated even until migration completed in the 1st round. When the 2nd round starts, if QEMU doesn't know the status of the disks, it needs to try inactivate the disk again. Here the issue is the block layer API bdrv_inactivate_all() will crash a QEMU if invoked on already inactive disks for the 2nd migration. For detail, see the bug link at the end. Implementation ============== This patch proposes to maintain disk activation with a global flag, so we know: - If we used to inactivate disks for migration, but migration got cancelled, or failed, QEMU will know it should reactivate the disks. - On incoming side, if the disks are never activated but then another migration is triggered, QEMU should be able to tell that inactivate is not needed for the 2nd migration. We used to have disk_inactive, but it only solves the 1st issue, not the 2nd. Also, it's done in completely separate paths so it's extremely hard to follow either how the flag changes, or the duration that the flag is valid, and when we will reactivate the disks. Convert the existing disk_inactive flag into that global flag (also invert its naming), and maintain the disk activation status for the whole lifecycle of qemu. That includes the incoming QEMU. Put both of the error cases of source migration (failure, cancelled) together into migration_iteration_finish(), which will be invoked for either of the scenario. So from that part QEMU should behave the same as before. However with such global maintenance on disk activation status, we not only cleanup quite a few temporary paths that we try to maintain the disk activation status (e.g. in postcopy code), meanwhile it fixes the crash for problem 2 in one shot. For freshly started QEMU, the flag is initialized to TRUE showing that the QEMU owns the disks by default. For incoming migrated QEMU, the flag will be initialized to FALSE once and for all showing that the dest QEMU doesn't own the disks until switchover. That is guaranteed by the "once" variable. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2395 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206230838.1111496-7-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	1 year ago
Philippe Mathieu-Daudé	32cad1ffb8	include: Rename sysemu/ -> system/ Headers in include/sysemu/ are not only related to system emulation, they are also used by virtualization. Rename as system/ which is clearer. Files renamed manually then mechanical change using sed tool. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Message-Id: <20241203172445.28576-1-philmd@linaro.org>	1 year ago
Peter Xu	e620b1e477	migration: Put thread names together with macros Keep migration thread names together, so it's easier to see a list of all possible migration threads. Still two functional changes below besides the macro defintions: - There's one dirty rate thread that we overlooked before, now we add that too and name it as "mig/dirtyrate" following the old rules. - The old name "mig/src/rp-thr" has "-thr" but it may not be useful if it's a thread name anyway, while "rp" can be slightly hard to read. Taking this chance to rename it to "mig/src/return", hopefully a better name. Reviewed-by: Fabiano Rosas <farosas@suse.de> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Link: https://lore.kernel.org/r/20241011153652.517440-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	1 year ago
Peter Xu	60ce47675d	migration: Rename thread debug names The postcopy thread names on dest QEMU are slightly confusing, partly I'll need to blame myself on `36f62f11e4` ("migration: Postcopy preemption preparation on channel creation"). E.g., "fault-fast" reads like a fast version of "fault-default", but it's actually the fast version of "postcopy/listen". Taking this chance, rename all the migration threads with proper rules. Considering we only have 15 chars usable, prefix all threads with "mig/", meanwhile identify src/dst threads properly this time. So now most thread names will look like "mig/DIR/xxx", where DIR will be "src"/"dst", except the bg-snapshot thread which doesn't have a direction. For multifd threads, making them "mig/{src\|dst}/{send\|recv}_%d". We used to have "live_migration" thread for a very long time, now it's called "mig/src/main". We may hope to have "mig/dst/main" soon but not yet. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2 years ago
Li Zhijian	3dc27fac25	migration/colo: Tidy up bql_unlock() around bdrv_activate_all() Make the code more tight. Suggested-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> [fixed mangled author email address] Signed-off-by: Fabiano Rosas <farosas@suse.de>	2 years ago
Li Zhijian	787ea49e80	migration/colo: make colo_incoming_co() return void Currently, it always returns 0, no need to check the return value at all. In addition, enter colo coroutine only if migration_incoming_colo_enabled() is true. Once the destination side enters the COLO* state, the COLO process will take over the remaining processes until COLO exits. Cc: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> [fixed mangled author email address] Signed-off-by: Fabiano Rosas <farosas@suse.de>	2 years ago
Fabiano Rosas	eef0bae3a7	migration: Remove block migration The block migration has been considered obsolete since QEMU 8.2 in favor of the more flexible storage migration provided by the blockdev-mirror driver. Two releases have passed so now it's time to remove it. Deprecation commit `66db46ca83` ("migration: Deprecate block migration"). Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2 years ago
Li Zhijian	2cc637f1ea	migration/colo: Fix bdrv_graph_rdlock_main_loop: Assertion `!qemu_in_coroutine()' failed. bdrv_activate_all() should not be called from the coroutine context, move it to the QEMU thread colo_process_incoming_thread() with the bql_lock protected. The backtrace is as follows: #4 0x0000561af7948362 in bdrv_graph_rdlock_main_loop () at ../block/graph-lock.c:260 #5 0x0000561af7907a68 in graph_lockable_auto_lock_mainloop (x=0x7fd29810be7b) at /patch/to/qemu/include/block/graph-lock.h:259 #6 0x0000561af79167d1 in bdrv_activate_all (errp=0x7fd29810bed0) at ../block.c:6906 #7 0x0000561af762b4af in colo_incoming_co () at ../migration/colo.c:935 #8 0x0000561af7607e57 in process_incoming_migration_co (opaque=0x0) at ../migration/migration.c:793 #9 0x0000561af7adbeeb in coroutine_trampoline (i0=-106876144, i1=22042) at ../util/coroutine-ucontext.c:175 #10 0x00007fd2a5cf21c0 in () at /lib64/libc.so.6 Cc: qemu-stable@nongnu.org Cc: Fabiano Rosas <farosas@suse.de> Closes: https://gitlab.com/qemu-project/qemu/-/issues/2277 Fixes: `2b3912f135` ("block: Mark bdrv_first_blk() and bdrv_is_root_node() GRAPH_RDLOCK") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Tested-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240417025634.1014582-1-lizhijian@fujitsu.com Signed-off-by: Peter Xu <peterx@redhat.com>	2 years ago
Steve Sistare	7395127f23	migration: privatize colo interfaces Remove private migration interfaces from net/colo-compare.c and push them to migration/colo.c. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/r/1710179338-294359-10-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2 years ago
Stefan Hajnoczi	a4a411fbaf	Replace "iothread lock" with "BQL" in comments The term "iothread lock" is obsolete. The APIs use Big QEMU Lock (BQL) in their names. Update the code comments to use "BQL" instead of "iothread lock". Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Message-id: 20240102153529.486531-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2 years ago
Stefan Hajnoczi	195801d700	system/cpus: rename qemu_mutex_lock_iothread() to bql_lock() The Big QEMU Lock (BQL) has many names and they are confusing. The actual QemuMutex variable is called qemu_global_mutex but it's commonly referred to as the BQL in discussions and some code comments. The locking APIs, however, are called qemu_mutex_lock_iothread() and qemu_mutex_unlock_iothread(). The "iothread" name is historic and comes from when the main thread was split into into KVM vcpu threads and the "iothread" (now called the main loop thread). I have contributed to the confusion myself by introducing a separate --object iothread, a separate concept unrelated to the BQL. The "iothread" name is no longer appropriate for the BQL. Rename the locking APIs to: - void bql_lock(void) - void bql_unlock(void) - bool bql_locked(void) There are more APIs with "iothread" in their names. Subsequent patches will rename them. There are also comments and documentation that will be updated in later patches. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Fabiano Rosas <farosas@suse.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Cédric Le Goater <clg@kaod.org> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20240102153529.486531-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2 years ago
Juan Quintela	be07a0ed22	qemu-file: Make qemu_fflush() return errors This let us simplify code of this shape. qemu_fflush(f); int ret = qemu_file_get_error(f); if (ret) { return ret; } into: int ret = qemu_fflush(f); if (ret) { return ret; } I updated all callers where there is any error check. qemu_fclose() don't need to check for f->last_error because qemu_fflush() returns it at the beggining of the function. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231025091117.6342-13-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2 years ago
Vladimir Sementsov-Ogievskiy	d0a14a2ba0	migration: process_incoming_migration_co(): move colo part to colo Let's make better public interface for COLO: instead of colo_process_incoming_thread and not trivial logic around creating the thread let's make simple colo_incoming_co(), hiding implementation from generic code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230515130640.46035-4-vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>	3 years ago
Vladimir Sementsov-Ogievskiy	dd42ce24a3	migration: split migration_incoming_co Originally, migration_incoming_co was introduced by `25d0c16f62` "migration: Switch to COLO process after finishing loadvm" to be able to enter from COLO code to one specific yield point, added by `25d0c16f62`. Later in `923709896b` "migration: poll the cm event for destination qemu" we reused this variable to wake the migration incoming coroutine from RDMA code. That was doubtful idea. Entering coroutines is a very fragile thing: you should be absolutely sure which yield point you are going to enter. I don't know how much is it safe to enter during qemu_loadvm_state() which I think what RDMA want to do. But for sure RDMA shouldn't enter the special COLO-related yield-point. As well, COLO code doesn't want to enter during qemu_loadvm_state(), it want to enter it's own specific yield-point. As well, when in `8e48ac9586` "COLO: Add block replication into colo process" we added bdrv_invalidate_cache_all() call (now it's called activate_all()) it became possible to enter the migration incoming coroutine during that call which is wrong too. So, let't make these things separate and disjoint: loadvm_co for RDMA, non-NULL during qemu_loadvm_state(), and colo_incoming_co for COLO, non-NULL only around specific yield. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230515130640.46035-3-vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>	3 years ago
Vladimir Sementsov-Ogievskiy	51e47cf860	build: move COLO under CONFIG_REPLICATION We don't allow to use x-colo capability when replication is not configured. So, no reason to build COLO when replication is disabled, it's unusable in this case. Note also that the check in migrate_caps_check() is not the only restriction: some functions in migration/colo.c will just abort if called with not defined CONFIG_REPLICATION, for example: migration_iteration_finish() case MIGRATION_STATUS_COLO: migrate_start_colo_process() colo_process_checkpoint() abort() It could probably make sense to have possibility to enable COLO without REPLICATION, but this requires deeper audit of colo & replication code, which may be done later if needed. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Acked-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230428194928.1426370-4-vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>	3 years ago
Vladimir Sementsov-Ogievskiy	4332ffcd7b	colo: make colo_checkpoint_notify static and provide simpler API colo_checkpoint_notify() is mostly used in colo.c. Outside we use it once when x-checkpoint-delay migration parameter is set. So, let's simplify the external API to only that function - notify COLO that parameter was set. This make external API more robust and hides implementation details from external callers. Also this helps us to make COLO module optional in further patch (i.e. we are going to add possibility not build the COLO module). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Message-Id: <20230428194928.1426370-3-vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>	3 years ago
Juan Quintela	f94a858fa3	migration: Create migrate_checkpoint_delay() Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>	3 years ago
Juan Quintela	1f0776f1c0	migration: Create options.c We move there all capabilities helpers from migration.c. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- Following David advise: - looked through the history, capabilities are newer than 2012, so we can remove that bit of the header. - This part is posterior to Anthony. Original Author is Orit. Once there, I put myself. Peter Xu also did quite a bit of work here. Anyone else wants/needs to be there? I didn't search too hard because nobody asked before to be added. What do you think?	3 years ago
Markus Armbruster	6f1e91f716	error: Drop superfluous #include "qapi/qmp/qerror.h" Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230207075115.1525-2-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com>	3 years ago
Markus Armbruster	720a252c26	qapi migration: Elide redundant has_FOO in generated C The has_FOO for pointer-valued FOO are redundant, except for arrays. They are also a nuisance to work with. Recent commit "qapi: Start to elide redundant has_FOO in generated C" provided the means to elide them step by step. This is the step for qapi/migration.json. Said commit explains the transformation in more detail. The invariant violations mentioned there do not occur here. Cc: Juan Quintela <quintela@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20221104160712.3005652-17-armbru@redhat.com>	3 years ago
Daniel P. Berrangé	77ef2dc1c8	migration: remove the QEMUFileOps abstraction Now that all QEMUFile callbacks are removed, the entire concept can be deleted. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	4 years ago
Rao, Lei	9c5c8ff24e	COLO: Move some trace code behind qemu_mutex_unlock_iothread() There is no need to put some trace code in the critical section. So, moving it behind qemu_mutex_unlock_iothread() can reduce the lock time. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	4 years ago
Zhang Chen	751fe4c608	migration/colo: Optimize COLO primary node start code path Optimize COLO primary start path from: MIGRATION_STATUS_XXX --> MIGRATION_STATUS_ACTIVE --> MIGRATION_STATUS_COLO --> MIGRATION_STATUS_COMPLETED To: MIGRATION_STATUS_XXX --> MIGRATION_STATUS_COLO --> MIGRATION_STATUS_COMPLETED No need to start primary COLO through "MIGRATION_STATUS_ACTIVE". Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	4 years ago
Rao, Lei	795969ab1f	Fixed a QEMU hang when guest poweroff in COLO mode When the PVM guest poweroff, the COLO thread may wait a semaphore in colo_process_checkpoint().So, we should wake up the COLO thread before migration shutdown. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	4 years ago
Zhang Chen	0e0f0479e2	migration/colo: More accurate update checkpoint time Previous operation(like vm_start and replication_start_all) will consume extra time before update the timer, so reduce time in this patch. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	4 years ago
Rao, Lei	91fe9a8dbd	Reset the auto-converge counter at every checkpoint. if we don't reset the auto-converge counter, it will continue to run with COLO running, and eventually the system will hang due to the CPU throttle reaching DEFAULT_MIGRATE_MAX_CPU_THROTTLE. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	4 years ago
Rao, Lei	2b9f6bf36c	Changed the last-mode to none of first start COLO When we first stated the COLO, the last-mode is as follows: { "execute": "query-colo-status" } {"return": {"last-mode": "primary", "mode": "primary", "reason": "none"}} The last-mode is unreasonable. After the patch, will be changed to the following: { "execute": "query-colo-status" } {"return": {"last-mode": "none", "mode": "primary", "reason": "none"}} Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	4 years ago
Rao, Lei	04dd89169b	Removed the qemu_fclose() in colo_process_incoming_thread After the live migration, the related fd will be cleanup in migration_incoming_state_destroy(). So, the qemu_close() in colo_process_incoming_thread is not necessary. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	4 years ago
Rao, Lei	ac183dac96	colo: fixed 'Segmentation fault' when the simplex mode PVM poweroff The GDB statck is as follows: Program terminated with signal SIGSEGV, Segmentation fault. 0 object_class_dynamic_cast (class=0x55c8f5d2bf50, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:832 if (type->class->interfaces && [Current thread is 1 (Thread 0x7f756e97eb00 (LWP 1811577))] (gdb) bt 0 object_class_dynamic_cast (class=0x55c8f5d2bf50, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:832 1 0x000055c8f2c3dd14 in object_dynamic_cast (obj=0x55c8f543ac00, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:763 2 0x000055c8f2c3ddce in object_dynamic_cast_assert (obj=0x55c8f543ac00, typename=0x55c8f2f7379e "qio-channel", file=0x55c8f2f73780 "migration/qemu-file-channel.c", line=117, func=0x55c8f2f73800 <__func__.18724> "channel_shutdown") at qom/object.c:786 3 0x000055c8f2bbc6ac in channel_shutdown (opaque=0x55c8f543ac00, rd=true, wr=true, errp=0x0) at migration/qemu-file-channel.c:117 4 0x000055c8f2bba56e in qemu_file_shutdown (f=0x7f7558070f50) at migration/qemu-file.c:67 5 0x000055c8f2ba5373 in migrate_fd_cancel (s=0x55c8f4ccf3f0) at migration/migration.c:1699 6 0x000055c8f2ba1992 in migration_shutdown () at migration/migration.c:187 7 0x000055c8f29a5b77 in main (argc=69, argv=0x7fff3e9e8c08, envp=0x7fff3e9e8e38) at vl.c:4512 The root cause is that we still want to shutdown the from_dst_file in migrate_fd_cancel() after qemu_close in colo_process_checkpoint(). So, we should set the s->rp_state.from_dst_file = NULL after qemu_close(). Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	4 years ago
Rao, Lei	ae4c209935	Some minor optimizations for COLO Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	4 years ago

1 2 3

126 Commits (161e6039759a77debee9434ee55a086c8a85ed60)