The file migration/cpr.c uses vmstate_cpr_vfio_devices which is
declared in hw/vfio/vfio-cpr.h, not in hw/vfio/vfio-device.h.
Replace the include with the correct header file to avoid pulling in
unnecessary VFIO device declarations.
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260211171532.1556719-1-clg@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Use qemu_savevm_state_non_iterable*() helpers for saving device states,
rather than walking the vmstate handlers on its own.
Non-iterables can be either early_setup devices, or otherwise.
Note that QEMU only has one early_setup device currently, which is
virtio-mem, and I highly doubt if it is used in either COLO or Xen users..
However this step is still better needed to provide full coverage of all
non-iterable vmstates.
When at it, allow it to report errors.
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Paul Durrant <paul@xen.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-25-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
This function is used by both COLO and Xen. Simplify it with two changes:
- Remove checks on qemu_savevm_se_iterable(): this is not needed as
vmstate_save() also checks for "save_state() || vmsd" instead. Here,
save_setup() (or say, iterable states) should be mutual exclusive to
"save_state() || vmsd" [*].
- Remove migrate_error_propagate(): both of the users are not using live
migration framework, but raw vmstate operations. Error propagation is
only needed for query-migrate persistence.
[*] One tricky user is VFIO, who provided _both_ save_state() and
save_setup(). However VFIO mustn't have been used in these paths or it
means both COLO and Xen have ignored VFIO data instead (that is,
qemu_savevm_se_iterable() will return true for VFIO). Hence, this change is
safe.
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Paul Durrant <paul@xen.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-24-peterx@redhat.com
[commit msg: s/not needed for/only needed for]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Let the function report errors to upper layers. Out of three current
users, two of them already process the errors, except one outlier,
qemu_savevm_state_complete_precopy(), where we do it manually for now with
a comment for TODO.
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-23-peterx@redhat.com
[add space in error_prepend string]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Cleanup bg_migration_thread() function on error handling. First of all,
early_fail is almost only used to say if BQL is taken. Since we already
have separate jumping labels, we don't really need it, hence removed.
Also, since local_err is around, making sure every failure path will set a
proper error string for the failure, then propagate to MigrationState.error.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-22-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Introduce this helper to detect if a SaveStateEntry is active.
Note that this helper can actually also be used in loadvm paths, but let's
stick with this name for now because we still use SaveStateEntry for the
shared structure that both savevm/loadvm uses, where this name still suites.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-21-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Split it into two smaller chunks:
- Dump of early_setup VMSDs
- Dump of save_setup() sections
They're mutual exclusive, hence we can run two loops and do them
sequentially. This will cause migration thread to loop one more time, but
it should be fine when migration just started and only do it once. It's
needed because we will need to reuse the early_vmsd helper later to
deduplicate code elsewhere.
QEMU almost sticks with qemu_savevm_state_XXX() to represent the dump of
vmstates's section XXX. With that in mind, this patch renamed the original
qemu_savevm_state_setup() to qemu_savevm_state_do_setup() instead.
So after this patch:
- qemu_savevm_state_non_iterable_early() dumps early_vmsds only,
- qemu_savevm_state_setup() dumps save_setup() sections only,
- qemu_savevm_state_do_setup() does all things needed during setup
phase (including migration SETUP notifies)
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-20-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
We did two unnecessary error propagations in qemu_savevm_state_setup(), on
either propagate it to MigrationState*, or set qemufile with error.
Error propagation is not needed because:
- Two live migration callers ([bg_]migration_thread) will propagate error
if this function returned with an error.
- Save snapshot (qemu_savevm_state) doesn't need to persist error; it got
returned directly from save_snapshot().
QEMUFile set error is not needed because the callers always check for
errors explicitly.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-19-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Make it pass in MigrationState* instead of s->to_dst_file, so as to drop
the internal migrate_get_current().
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-18-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Split the function, making itself to be the helper to dump all non-iterable
device states (early_vmsd excluded). Move the precopy end logic out to the
two callers that need it.
With it, we can remove the in_postcopy parameter. Meanwhile, renaming the
function to be qemu_savevm_state_non_iterable(): we don't need the keyword
"complete" because non-iterable doesn't iterate anyway, and we don't need
precopy because we moved precopy specialties out.
NOTE: this patch introduced one new migrate_get_current() user; will be
removed in follow up patch.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-17-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Provide two smaller helpers to dump the vm desc. Preparing to move it out
and generalize device state dump.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-16-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Put buffer can be async as long as the flush happens before the buffer will
be recycled / reused. Do it for postcopy package data. Quick measurement
shows a small VM the time to push / flush the package shrinks from 91us to
38us.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-14-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
We can safely use the async version of put buffer here because the qemufile
will be flushed right away.
Suggested-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-13-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
All users of qemu_savevm_state_complete_precopy_non_iterable() process
return values. There's no need to set error on qemufile (which we likely
should remove gradually across the tree). Remove it for possible code
dedup to happen later.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-12-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Now after removing the special case in COLO, we can drop this parameter.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-11-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
COLO caches all device states in a buffer channel `fb'. Add some comments
explaining the flush, that (1) it's the `fb' not the main channel, (2) on
what it updates.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-10-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
qemu_savevm_state_complete_precopy() has a weird parameter called
"iterable_only". It's needed because COLO saves device states in advance.
To make dropping that weird parameter easier, let COLO directly use the RAM
iterator helper instead, which should make the code easier to read too.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-9-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
COLO (in case of periodically checkpointing) already have switchover
happened before hand. This switchover_start feature never applies to COLO.
Savevm for snapshot doesn't have switchover phase and VM is stopped for the
whole process.
Remove both.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-7-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
It's only used in COLO path and only contains two calls. Unwrap the
function. It paves way for further reduce special COLO paths on sync.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-6-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
It's neither accurate nor necessary. Use a proper helper to detect if it's
an iterable savevm state entry instead.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-5-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reduces duplication of the other path where we also send the same header.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-3-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Split qemu_savevm_state_header() into two parts. This paves way for a
reuse elsewhere.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/qemu-devel/20260127185254.3954634-2-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
All three events are shared between precopy and postcopy, rather than
precopy specific.
For example, both precopy and postcopy will go through a SETUP process.
Meanwhile, both FAILED and DONE notifiers will be notified for either
precopy or postcopy on completions / failures.
Rename them to make them match what they do, and shorter.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260126213614.3815900-6-peterx@redhat.com
[fixed-up entry in scsi-disk.c that got merged first]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Postcopy (in failure path) should share with precopy on disk reactivations.
Explicit activiation should used to be fine even if called twice, but after
26f65c01ed ("migration: Do not try to start VM if disk activation fails")
we may want to avoid it and always capture failure when reactivation
happens (even if we do not expect the failure to happen). Remove this
redundant call.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260126213614.3815900-5-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Devices may opt-in migration FAILED notifiers to be invoked when migration
fails. Currently, the notifications happen in migration_cleanup(). It is
normally fine, but maybe not ideal if there's dependency of the fallback
v.s. VM starts.
This patch moves the FAILED notification earlier, so that if the failure
happened during switchover, it'll notify before VM restart.
After walking over all existing FAILED notifier users, I got the conclusion
that this should also be a cleaner approach at least from design POV.
We have these notifier users, where the first two do not need to trap
FAILED:
|----------------------------+-------------------------------------+---------------------|
| device | handler | events needed |
|----------------------------+-------------------------------------+---------------------|
| gicv3 | kvm_arm_gicv3_notifier | DONE |
| vfio_iommufd / vfio_legacy | vfio_cpr_reboot_notifier | SETUP |
| cpr-exec | cpr_exec_notifier | FAILED, DONE |
| virtio-net | virtio_net_migration_state_notifier | SETUP, FAILED |
| vfio | vfio_migration_state_notifier | FAILED |
| vdpa | vdpa_net_migration_state_notifier | SETUP, FAILED |
| spice [*] | migration_state_notifier | SETUP, FAILED, DONE |
|----------------------------+-------------------------------------+---------------------|
For cpr-exec, it tries to cleanup some cpr-exec specific fd or env
variables. This should be fine either way, as long as before
migration_cleanup().
For virtio-net, we need to re-plug the primary device back to guest in the
failover mode. Likely benign.
VFIO needs to re-start the device if FAILED. IIUC it should do it before
vm_start(), if the VFIO device can be put into a STOPed state due to
migration, we should logically make it running again before vCPUs run.
VDPA will disable SVQ when migration is FAILED. Likely benign too, but
looks better if we can do it before resuming vCPUs.
For spice, we should rely on "spice_server_migrate_end(false)" to retake
the ownership. Benign, but looks more reasonable if the spice client does
it before VM runs again.
Note that this change may introduce slightly more downtime, if the
migration failed exactly at the switchover phase. But that's very rare,
and even if it happens, none of above expects a long delay, but a short
one, likely will be buried in the total downtime even if failed.
Cc: Cédric Le Goater <clg@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260126213614.3815900-4-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Migration notifiers will notify at any of three places: (1) SETUP
phase, (2) migration completes, (3) migration fails.
There's actually a special case for spice: one can refer to
b82fc321bf ("Postcopy+spice: Pass spice migration data earlier"). It
doesn't need another 4th event because in commit 9d9babf78d ("migration:
MigrationEvent for notifiers") we merged it together with the DONE event.
The merge makes some sense if we treat "switchover" of postcopy as "DONE",
however that also means for postcopy we'll notify DONE twice.. The other
one at the end of postcopy when migration_cleanup().
In reality, the current code base will also notify FAILED for postcopy
twice. It's because an (maybe accidental) change in commit
4af667f87c ("migration: notifier error checking").
First of all, we still need that notification when switchover as stated in
Dave's commit, however that's only needed for spice. To fix it, introduce
POSTCOPY_START event to differenciate it from DONE. Use that instead in
postcopy_start(). Then spice will need to capture this event too.
Then we remove the extra FAILED notification in postcopy_start().
If one wonder if other DONE users should also monitor POSTCOPY_START
event.. We have two more DONE users:
- kvm_arm_gicv3_notifier
- cpr_exec_notifier
Both of them do not need a notification for POSTCOPY_START, but only when
migration completed. Actually, both of them are used in CPR, which doesn't
support postcopy.
When at this, update the notifier transition graph in the comment, and move
it from migration_add_notifier() to be closer to where the enum is defined.
I didn't attach Fixes: because I am not aware of any real bug on such
double reporting. I'm wildly guessing the 2nd notify might be silently
ignored in many cases. However this is still worth fixing.
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Dr. David Alan Gilbert <dave@treblig.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/qemu-devel/20260126213614.3815900-3-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Make the synchronous calls evident by not hiding the call to
migration_channel_connect_outgoing() in the transport code. Have those
functions return and call the function at the upper level.
This helps with navigation: the transport code returns the ioc,
there's no need to look into them when browsing the code.
It also allows RDMA in the source side to use the same path as the
rest of the transports.
While here, document the async calls which are the exception.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-26-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
After cleanups, the qmp_migrate_finish function is now just a call to
migration_connect_outgoing(). Remove qmp_migrate_finish() and rename
the qmp_migrate_finish_cb callback.
This also allows the function's error handling to be removed as it now
receives &local_err like the rest of the callees of qmp_migrate().
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-25-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Move this CPR-specific code into a cpr file. While here, give the
functions more significant names.
This makes the new idea (after cpr-transfer) of having two parts to
qmp_migrate slightly more obvious: either wait for the hangup or
continue directly.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-24-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
When setting a callback on a Glib source and giving it a data pointer,
it's natural to also provide the destructor for the data in question.
Since migrate_hup_add() already needs to clone the MigrationAddress
when setting the qmp_migrate_finish_cb callback, also pass the
qapi_free_MigrationAddress as the GDestroyNotify callback.
With this the address doesn't need to be freed at the callback body,
making the management of that memory slightly simpler.
Cc: Mark Kanda <mark.kanda@oracle.com>
Cc: Ben Chaney <bchaney@akamai.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-23-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The migrate_uri_parse function is responsible for converting the URI
string into a MigrationChannel for consumption by the rest of the
code. Move it to channel.c and add a wrapper that calls both URI and
channels parsing.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-22-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Encapsulate the MigrationChannelList parsing in a new
migrate_channels_parse() located at channel.c.
This also makes the memory management of the MigrationAddress more
uniform. Previously, half the parsing code (uri parsing) would
allocate memory for the address while the other half (channel parsing)
would instead pass the original QAPI object along. After this patch,
the MigrationAddress is always QAPI_CLONEd, so the callers can use
g_autoptr(MigrationAddress) in all cases.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-21-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Move the <transport>_connect_incoming|outgoing functions to channel.c.
It leaves migration.c to deal with the established connection only.
While here, sort the includes.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-20-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Move the code responsible for the various channels connection into
channel.c. This is all executed before the migration_thread and
process_incoming_migration_co are running, so it helps the reasoning
to have them out of migration.c.
migration_ioc_process_incoming becomes migration_channel_identify
which is more in line with what the function does.
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-19-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
To make it easier to follow the code, rename the functions that start
the migration thread and migration coroutine to contain the word
"start".
This will give new contributors the chance of seeing the word start
and reaching the actual migration code, instead of twists and turns of
qio_channel_add_watch and qio_task_run_in_thread.
Remove all other instances of "start" and use wording more suitable to
what the current migration stage is. The transport code such as
fd_start_migration_outgoing becomes fd_connect_outgoing, the early
setup code such as qemu_start_incoming_migration becomes
qemu_setup_incoming_migration and so on.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-18-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Rename migration_channel_connect to indicate this is the source
side. Future patches will do similar changes to the incoming side and
this will avoid inconsistencies in naming.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-17-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Leave migration_ioc_process_incoming to do only the channel
identification process and move the migration start into
channel.c. Both routines will be renamed in the next patches to better
reflect their usage.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-16-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Use the common connection paths for the incoming and outgoing sides of
rdma migration. This removes one usage of QEMUFile from rdma.c. It
also allows further unification of the connection code in next
patches.
Move the channels enum to channel.h so rdma.c can access it. The RDMA
channel is considered a CH_MAIN channel.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-15-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Centralize, on both sides of migration, the setting of the to_src_file
and from_dst_file QEMUFiles. This will clean up the interface with
channel.c and rdma.c, allowing those files to stop dealing with
QEMUFile themselves.
(multifd_recv_new_channel was changed to return bool+errp for
convenience)
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-14-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Simplify migration_channel_connect() and migration_connect() to not
take an error as input. Move the error handling into the paths that
generate the error.
To achieve this, call migration_connect_error_propagate() from
socket.c and tls.c, which are the async paths.
For the sync paths, the handling is done as normal by returning all
the way to qmp_migrate_finish(), except that now the sync paths don't
pass the error forward into migration_connect() anymore.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-13-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Whenever an error occurs between migrate_init() and the start of
migration_thread, do cleanup immediately.
This allows the special casing for resume to be removed from
migration_connect(), that check is now done at
migration_connect_error_propagate() which already had a case for
resume.
The cleanup at qmp_migrate_finish_cb can also be removed because it
will always be reached either via the error path at
qmp_migrate_finish->migration_connect_error_propagate or via the
migrate_cleanup_bh.
The yank_unregister_instance at qmp_migrate() is now replaced by the
one at migration_cleanup().
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-12-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Move the register_instance call to migrate_prepare() so it can be
paired with the unregister_instance at migration_cleanup(). Otherwise,
the cleanup cannot be run when cpr_state_save() fails because the
instance is registered only after it.
When resuming from a paused postcopy migration, migrate_prepare()
returns early, but migration_cleanup() doesn't run, so the yank will
remain paired.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-11-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Cover the CANCELLING state in migration_connect_error_propagate() and
use it to funnel errors from migrate_prepare() until the end of
migration_connect().
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-10-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
In the next patches migration_cleanup() will be used in qmp_migrate(),
which currently does not show an error message. Move the error
reporting out of migration_cleanup() to avoid duplicate messages.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-9-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Freeing the error at migration_connect() is redundant in the normal
migration case. The freeing already happened at migrate_init():
qmp_migrate()
-> migrate_prepare()
-> migrate_init()
-> qmp_migrate_finish()
-> *_start_outgoing_migration()
-> migration_channel_connect()
-> migration_connect()
For the resume case, migrate_prepare() returns early and doesn't reach
migrate_init(). Move the extra migrate_error_free() call to
migrate_prepare() along with the resume check.
Also change migrate_init() to use migrate_error_free(), so it's easier
to see where are the places the error gets freed.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-8-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>
cpr_set_incoming_mode() is only called on the target side, so
migrate_mode() on the source side is the same as s->parameters.mode.
Use the function to reduce explicit access to s->parameters, we have
options.c for that.
Cc: Mark Kanda <mark.kanda@oracle.com>
Cc: Ben Chaney <bchaney@akamai.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/qemu-devel/20260123141656.6765-7-farosas@suse.de
Signed-off-by: Fabiano Rosas <farosas@suse.de>