riscv/qemu - qemu - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Markus Armbruster	a4cda04107	error: error_free(NULL) is safe, drop unnecessary conditionals Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20251119130855.105479-5-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>	4 months ago
Paolo Bonzini	12e50722e4	block: rename block/aio-wait.h to qemu/aio-wait.h AIO_WAIT_WHILE is used even outside the block layer; move the header file out of block/ just like the implementation is in util/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	4 months ago
Eric Blake	de252f7993	qio: Add QIONetListener API for using AioContext The user calling himself "John Doe" reported a deadlock when attempting to use qemu-storage-daemon to serve both a base file over NBD, and a qcow2 file with that NBD export as its backing file, from the same process, even though it worked just fine when there were two q-s-d processes. The bulk of the NBD server code properly uses coroutines to make progress in an event-driven manner, but the code for spawning a new coroutine at the point when listen(2) detects a new client was hard-coded to use the global GMainContext; in other words, the callback that triggers nbd_client_new to let the server start the negotiation sequence with the client requires the main loop to be making progress. However, the code for bdrv_open of a qcow2 image with an NBD backing file uses an AIO_WAIT_WHILE nested event loop to ensure that the entire qcow2 backing chain is either fully loaded or rejected, without any side effects from the main loop causing unwanted changes to the disk being loaded (in short, an AioContext represents the set of actions that are known to be safe while handling block layer I/O, while excluding any other pending actions in the global main loop with potentially larger risk of unwanted side effects). This creates a classic case of deadlock: the server can't progress to the point of accept(2)ing the client to write to the NBD socket because the main loop is being starved until the AIO_WAIT_WHILE completes the bdrv_open, but the AIO_WAIT_WHILE can't progress because it is blocked on the client coroutine stuck in a read() of the expected magic number from the server side of the socket. This patch adds a new API to allow clients to opt in to listening via an AioContext rather than a GMainContext. This will allow NBD to fix the deadlock by performing all actions during bdrv_open in the main loop AioContext. Technical debt warning: I would have loved to utilize a notify function with AioContext to guarantee that we don't finalize listener due to an object_unref if there is any callback still running (the way GSource does), but wiring up notify functions into AioContext is a bigger task that will be deferred to a later QEMU release. But for solving the NBD deadlock, it is sufficient to note that the QMP commands for enabling and disabling the NBD server are really the only points where we want to change the listener's callback. Furthermore, those commands are serviced in the main loop, which is the same AioContext that is also listening for connections. Since a thread cannot interrupt itself, we are ensured that at the point where we are changing the watch, there are no callbacks active. This is NOT as powerful as the GSource cross-thread safety, but sufficient for the needs of today. An upcoming patch will then add a unit test (kept separate to make it easier to rearrange the series to demonstrate the deadlock without this patch). Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169 Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20251113011625.878876-26-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	5 months ago
Eric Blake	cc0faf8273	qio: Prepare NetListener to use AioContext For ease of review, this patch adds an AioContext pointer to the QIONetListener struct, the code to trace it, and refactors listener->io_source to instead be an array of utility structs; but the aio_context pointer is always NULL until the next patch adds an API to set it. There should be no semantic change in this patch. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20251113011625.878876-25-eblake@redhat.com>	5 months ago
Eric Blake	ec59a65a4d	qio: Provide accessor around QIONetListener->sioc An upcoming patch needs to pass more than just sioc as the opaque pointer to an AioContext; but since our AioContext code in general (and its QIO Channel wrapper code) lacks a notify callback present with GSource, we do not have the trivial option of just g_malloc'ing a small struct to hold all that data coupled with a notify of g_free. Instead, the data pointer must outlive the registered handler; in fact, having the data pointer have the same lifetime as QIONetListener is adequate. But the cleanest way to stick such a helper struct in QIONetListener will be to rearrange internal struct members. And that in turn means that all existing code that currently directly accesses listener->nsioc and listener->sioc[] should instead go through accessor functions, to be immune to the upcoming struct layout changes. So this patch adds accessor methods qio_net_listener_nsioc() and qio_net_listener_sioc(), and puts them to use. While at it, notice that the pattern of grabbing an sioc from the listener only to turn around can call qio_channel_socket_get_local_address is common enough to also warrant the helper of qio_net_listener_get_local_address, and fix a copy-paste error in the corresponding documentation. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20251113011625.878876-24-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	5 months ago
Eric Blake	e685dd26c7	qio: Factor out helpers qio_net_listener_[un]watch The code had three similar repetitions of an iteration over one or all of nsiocs to set up a GSource, and likewise for teardown. Since an upcoming patch wants to tweak whether GSource or AioContext is used, it's better to consolidate that into one helper function for fewer places to edit later. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20251113011625.878876-22-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	5 months ago
Eric Blake	36769aeedf	qio: Minor optimization when callback function is unchanged In qemu-nbd and other NBD server setups where parallel clients are supported, it is common that the caller will re-register the same callback function as long as it has not reached its limit on simultaneous clients. In that case, there is no need to tear down and reinstall GSource watches in the GMainContext. In practice, all existing callers currently pass NULL for notify, and no caller ever changes context across calls (for async uses, either the caller consistently uses qio_net_listener_set_client_func_full with the same context, or the caller consistently uses only qio_net_listener_set_client_func which always uses the global context); but the time spent checking these two fields in addition to the more important func and data is still less than the savings of not churning through extra GSource manipulations when the result will be unchanged. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20251113011625.878876-21-eblake@redhat.com>	5 months ago
Eric Blake	9d86181874	qio: Protect NetListener callback with mutex Without a mutex, NetListener can run into this data race between a thread changing the async callback callback function to use when a client connects, and the thread servicing polling of the listening sockets: Thread 1: qio_net_listener_set_client_func(lstnr, f1, ...); => foreach sock: socket => object_ref(lstnr) => sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, object_unref); Thread 2: poll() => event POLLIN on socket => ref(GSourceCallback) => if (lstnr->io_func) // while lstnr->io_func is f1 ...interrupt.. Thread 1: qio_net_listener_set_client_func(lstnr, f2, ...); => foreach sock: socket => g_source_unref(sock_src) => foreach sock: socket => object_ref(lstnr) => sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, object_unref); Thread 2: => call lstnr->io_func(lstnr->io_data) // now sees f2 => return dispatch(sock) => unref(GSourceCallback) => destroy-notify => object_unref Found by inspection; I did not spend the time trying to add sleeps or execute under gdb to try and actually trigger the race in practice. This is a SEGFAULT waiting to happen if f2 can become NULL because thread 1 deregisters the user's callback while thread 2 is trying to service the callback. Other messes are also theoretically possible, such as running callback f1 with an opaque pointer that should only be passed to f2 (if the client code were to use more than just a binary choice between a single async function or NULL). Mitigating factor: if the code that modifies the QIONetListener can only be reached by the same thread that is executing the polling and async callbacks, then we are not in a two-thread race documented above (even though poll can see two clients trying to connect in the same window of time, any changes made to the listener by the first async callback will be completed before the thread moves on to the second client). However, QEMU is complex enough that this is hard to generically analyze. If QMP commands (like nbd-server-stop) are run in the main loop and the listener uses the main loop, things should be okay. But when a client uses an alternative GMainContext, or if servicing a QMP command hands off to a coroutine to avoid blocking, I am unable to state with certainty whether a given net listener can be modified by a thread different from the polling thread running callbacks. At any rate, it is worth having the API be robust. To ensure that modifying a NetListener can be safely done from any thread, add a mutex that guarantees atomicity to all members of a listener object related to callbacks. This problem has been present since QIONetListener was introduced. Note that this does NOT prevent the case of a second round of the user's old async callback being invoked with the old opaque data, even when the user has already tried to change the async callback during the first async callback; it is only about ensuring that there is no sharding (the eventual io_func(io_data) call that does get made will correspond to a particular combination that the user had requested at some point in time, and not be sharded to a combination that never existed in practice). In other words, this patch maintains the status quo that a user's async callback function already needs to be robust to parallel clients landing in the same window of poll servicing, even when only one client is desired, if that particular listener can be amended in a thread other than the one doing the polling. CC: qemu-stable@nongnu.org Fixes: `53047392` ("io: introduce a network socket listener API", v2.12.0) Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20251113011625.878876-20-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [eblake: minor commit message wording improvements] Signed-off-by: Eric Blake <eblake@redhat.com>	5 months ago
Eric Blake	b5676493a0	qio: Remember context of qio_net_listener_set_client_func_full io/net-listener.c has two modes of use: asynchronous (the user calls qio_net_listener_set_client_func to wake up the callback via the global GMainContext, or qio_net_listener_set_client_func_full to wake up the callback via the caller's own alternative GMainContext), and synchronous (the user calls qio_net_listener_wait_client which creates its own GMainContext and waits for the first client connection before returning, with no need for a user's callback). But commit `938c8b79` has a latent logic flaw: when qio_net_listener_wait_client finishes on its temporary context, it reverts all of the siocs back to the global GMainContext rather than the potentially non-NULL context they might have been originally registered with. Similarly, if the user creates a net-listener, adds initial addresses, registers an async callback with a non-default context (which ties to all siocs for the initial addresses), then adds more addresses with qio_net_listener_add, the siocs for later addresses are blindly placed in the global context, rather than sharing the context of the earlier ones. In practice, I don't think this has caused issues. As pointed out by the original commit, all async callers prior to that commit were already okay with the NULL default context; and the typical usage pattern is to first add ALL the addresses the listener will pay attention to before ever setting the async callback. Likewise, if a file uses only qio_net_listener_set_client_func instead of qio_net_listener_set_client_func_full, then it is never using a custom context, so later assignments of async callbacks will still be to the same global context as earlier ones. Meanwhile, any callers that want to do the sync operation to grab the first client are unlikely to register an async callback; altogether bypassing the question of whether later assignments of a GSource are being tied to a different context over time. I do note that chardev/char-socket.c is the only file that calls both qio_net_listener_wait_client (sync for a single client in tcp_chr_accept_server_sync), and qio_net_listener_set_client_func_full (several places, all with chr->gcontext, but sometimes with a NULL callback function during teardown). But as far as I can tell, the two uses are mutually exclusive, based on the is_waitconnect parameter to qmp_chardev_open_socket_server. That said, it is more robust to remember when an async callback function is tied to a non-default context, and have both the sync wait and any late address additions honor that same context. That way, the code will be robust even if a later user performs a sync wait for a specific client in the middle of servicing a longer-lived QIONetListener that has an async callback for all other clients. CC: qemu-stable@nongnu.org Fixes: `938c8b79` ("qio: store gsources for net listeners", v2.12.0) Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20251113011625.878876-19-eblake@redhat.com>	5 months ago
Eric Blake	6e03d5cdc9	qio: Unwatch before notify in QIONetListener When changing the callback registered with QIONetListener, the code was calling notify on the old opaque data prior to actually removing the old GSource objects still pointing to that data. Similarly, during finalize, it called notify before tearing down the various GSource objects tied to the data. In practice, a grep of the QEMU code base found that every existing client of QIONetListener passes in a NULL notifier (the opaque data, if non-NULL, outlives the NetListener and so does not need cleanup when the NetListener is torn down), so this patch has no impact. And even if a caller had passed in a reference-counted object with a notifier of object_unref but kept its own reference on the data, then the early notify would merely reduce a refcount from (say) 2 to 1, but not free the object. However, it is a latent bug waiting to bite any future caller that passes in data where the notifier actually frees the object, because the GSource could then trigger a use-after-free if it loses the race on a last-minute client connection resulting in the data being passed to one final use of the async callback. Better is to delay the notify call until after all GSource that have been given a copy of the opaque data are torn down. CC: qemu-stable@nongnu.org Fixes: `530473924d` "io: introduce a network socket listener API", v2.12.0 Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20251113011625.878876-18-eblake@redhat.com>	5 months ago
Eric Blake	59506e59e0	qio: Add trace points to net_listener Upcoming patches will adjust how net_listener watches for new client connections; adding trace points now makes it easier to debug that the changes work as intended. For example, adding --trace='qio_net_listener*' to the qemu-storage-daemon command line before --nbd-server will track when the server first starts listening for clients. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20251113011625.878876-17-eblake@redhat.com>	5 months ago
Peter Xu	1edf0df284	io: Add qio_channel_wait_cond() helper Add the helper to wait for QIO channel's IO availability in any context (coroutine, or non-coroutine). Use it tree-wide for three occurences. Cc: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Link: https://lore.kernel.org/r/20251022192612.2737648-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	5 months ago
Manish Mishra	84005f4a2b	io: flush zerocopy socket error queue on sendmsg failure due to ENOBUF The kernel allocates extra metadata SKBs in case of a zerocopy send, eventually used for zerocopy's notification mechanism. This metadata memory is accounted for in the OPTMEM limit. The kernel queues completion notifications on the socket error queue and this error queue is freed when userspace reads it. Usually, in the case of in-order processing, the kernel will batch the notifications and merge the metadata into a single SKB and free the rest. As a result, it never exceeds the OPTMEM limit. However, if there is any out-of-order processing or intermittent zerocopy failures, this error chain can grow significantly, exhausting the OPTMEM limit. As a result, all new sendmsg requests fail to allocate any new SKB, leading to an ENOBUF error. Depending on the amount of data queued before the flush (i.e., large live migration iterations), even large OPTMEM limits are prone to failure. To work around this, if we encounter an ENOBUF error with a zerocopy sendmsg, flush the error queue and retry once more. Co-authored-by: Manish Mishra <manish.mishra@nutanix.com> Signed-off-by: Tejus GK <tejus.gk@nutanix.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [DB: change TRUE/FALSE to true/false for 'bool' type; add more #ifdef QEMU_MSG_ZEROCOPY blocks] Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	5 months ago
Tejus GK	e52d822716	io: add a "blocking" field to QIOChannelSocket Add a 'blocking' boolean field to QIOChannelSocket to track whether the underlying socket is in blocking or non-blocking mode. Signed-off-by: Tejus GK <tejus.gk@nutanix.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	5 months ago
Philippe Mathieu-Daudé	989221c0c7	io/channel: Have read/write functions take void * buffer argument I/O channel read/write functions can operate on any area of memory, regardless of the content their represent. Do not restrict to array of char, use the void* type, which is also the type of the underlying iovec::iov_base field. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> [DB: also adapt test-crypto-tlssession.c func signatures] Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	5 months ago
Daniel P. Berrangé	b7a1f2ca45	io: fix use after free in websocket handshake code If the QIOChannelWebsock object is freed while it is waiting to complete a handshake, a GSource is leaked. This can lead to the callback firing later on and triggering a use-after-free in the use of the channel. This was observed in the VNC server with the following trace from valgrind: ==2523108== Invalid read of size 4 ==2523108== at 0x4054A24: vnc_disconnect_start (vnc.c:1296) ==2523108== by 0x4054A24: vnc_client_error (vnc.c:1392) ==2523108== by 0x4068A09: vncws_handshake_done (vnc-ws.c:105) ==2523108== by 0x44863B4: qio_task_complete (task.c:197) ==2523108== by 0x448343D: qio_channel_websock_handshake_io (channel-websock.c:588) ==2523108== by 0x6EDB862: UnknownInlinedFun (gmain.c:3398) ==2523108== by 0x6EDB862: g_main_context_dispatch_unlocked.lto_priv.0 (gmain.c:4249) ==2523108== by 0x6EDBAE4: g_main_context_dispatch (gmain.c:4237) ==2523108== by 0x45EC79F: glib_pollfds_poll (main-loop.c:287) ==2523108== by 0x45EC79F: os_host_main_loop_wait (main-loop.c:310) ==2523108== by 0x45EC79F: main_loop_wait (main-loop.c:589) ==2523108== by 0x423A56D: qemu_main_loop (runstate.c:835) ==2523108== by 0x454F300: qemu_default_main (main.c:37) ==2523108== by 0x73D6574: (below main) (libc_start_call_main.h:58) ==2523108== Address 0x57a6e0dc is 28 bytes inside a block of size 103,608 free'd ==2523108== at 0x5F2FE43: free (vg_replace_malloc.c:989) ==2523108== by 0x6EDC444: g_free (gmem.c:208) ==2523108== by 0x4053F23: vnc_update_client (vnc.c:1153) ==2523108== by 0x4053F23: vnc_refresh (vnc.c:3225) ==2523108== by 0x4042881: dpy_refresh (console.c:880) ==2523108== by 0x4042881: gui_update (console.c:90) ==2523108== by 0x45EFA1B: timerlist_run_timers.part.0 (qemu-timer.c:562) ==2523108== by 0x45EFC8F: timerlist_run_timers (qemu-timer.c:495) ==2523108== by 0x45EFC8F: qemu_clock_run_timers (qemu-timer.c:576) ==2523108== by 0x45EFC8F: qemu_clock_run_all_timers (qemu-timer.c:663) ==2523108== by 0x45EC765: main_loop_wait (main-loop.c:600) ==2523108== by 0x423A56D: qemu_main_loop (runstate.c:835) ==2523108== by 0x454F300: qemu_default_main (main.c:37) ==2523108== by 0x73D6574: (below main) (libc_start_call_main.h:58) ==2523108== Block was alloc'd at ==2523108== at 0x5F343F3: calloc (vg_replace_malloc.c:1675) ==2523108== by 0x6EE2F81: g_malloc0 (gmem.c:133) ==2523108== by 0x4057DA3: vnc_connect (vnc.c:3245) ==2523108== by 0x448591B: qio_net_listener_channel_func (net-listener.c:54) ==2523108== by 0x6EDB862: UnknownInlinedFun (gmain.c:3398) ==2523108== by 0x6EDB862: g_main_context_dispatch_unlocked.lto_priv.0 (gmain.c:4249) ==2523108== by 0x6EDBAE4: g_main_context_dispatch (gmain.c:4237) ==2523108== by 0x45EC79F: glib_pollfds_poll (main-loop.c:287) ==2523108== by 0x45EC79F: os_host_main_loop_wait (main-loop.c:310) ==2523108== by 0x45EC79F: main_loop_wait (main-loop.c:589) ==2523108== by 0x423A56D: qemu_main_loop (runstate.c:835) ==2523108== by 0x454F300: qemu_default_main (main.c:37) ==2523108== by 0x73D6574: (below main) (libc_start_call_main.h:58) ==2523108== The above can be reproduced by launching QEMU with $ qemu-system-x86_64 -vnc localhost:0,websocket=5700 and then repeatedly running: for i in {1..100}; do (echo -n "GET / HTTP/1.1" && sleep 0.05) \| nc -w 1 localhost 5700 & done CVE-2025-11234 Reported-by: Grant Millar \| Cylo <rid@cylo.io> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	6 months ago
Daniel P. Berrangé	322c3c4f3a	io: move websock resource release to close method The QIOChannelWebsock object releases all its resources in the finalize callback. This is later than desired, as callers expect to be able to call qio_channel_close() to fully close a channel and release resources related to I/O. The logic in the finalize method is at most a failsafe to handle cases where a consumer forgets to call qio_channel_close. This adds equivalent logic to the close method to release the resources, using g_clear_handle_id/g_clear_pointer to be robust against repeated invocations. The finalize method is tweaked so that the GSource is removed before releasing the underlying channel. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	6 months ago
Daniel P. Berrangé	2c147611cf	io: release active GSource in TLS channel finalizer While code is supposed to call qio_channel_close() before releasing the last reference on an QIOChannel, this is not guaranteed. QIOChannelFile and QIOChannelSocket both cleanup resources in their finalizer if the close operation was missed. This ensures the TLS channel will do the same failsafe cleanup. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	6 months ago
Daniel P. Berrangé	6a9e81b705	crypto: propagate Error object on premature termination The way that premature termination was handled in TLS connections was changed to handle an ordering problem during graceful shutdown in the migration code. Unfortunately one of the codepaths returned -1 to indicate an error condition, but failed to set the 'errp' parameter. This broke error handling in the qio_channel_tls_handshake function, as the QTask callback would no longer see that an error was raised. As a result, the client will go on to try to use the already closed TLS connection, resulting in misleading errors. This was evidenced in the I/O test 233 which showed changes such as -qemu-nbd: Certificate does not match the hostname localhost +qemu-nbd: Failed to read initial magic: Unable to read from socket: Connection reset by peer Fixes: `7e0c22d585` Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	6 months ago
Peter Xu	7e0c22d585	io/crypto: Move tls premature termination handling into QIO layer QCryptoTLSSession allows TLS premature termination in two cases, one of the case is when the channel shutdown() is invoked on READ side. It's possible the shutdown() happened after the read thread blocked at gnutls_record_recv(). In this case, we should allow the premature termination to happen. The problem is by the time qcrypto_tls_session_read() was invoked, tioc->shutdown may not have been set, so this may instead be treated as an error if there is concurrent shutdown() calls. To allow the flag to reflect the latest status of tioc->shutdown, move the check upper into the QIOChannel level, so as to read the flag only after QEMU gets an GNUTLS_E_PREMATURE_TERMINATION. When at it, introduce qio_channel_tls_allow_premature_termination() helper to make the condition checks easier to read. When doing so, change the qatomic_load_acquire() to qatomic_read(): here we don't need any ordering of memory accesses, but reading a flag. qatomic_read() would suffice because it guarantees fetching from memory. Nothing else we should need to order on memory access. This patch will fix a qemu qtest warning when running the preempt tls test, reporting premature termination: QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test --full -r /x86_64/migration/postcopy/preempt/tls/psk ... qemu-kvm: Cannot read from TLS channel: The TLS connection was non-properly terminated. ... In this specific case, the error was set by postcopy_preempt_thread, which normally will be concurrently shutdown()ed by the main thread. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250918203937.200833-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	6 months ago
Markus Armbruster	bcb536cabe	error: Kill @error_warn We added @error_warn some two years ago in commit `3ffef1a55c` (error: add global &error_warn destination). It has multiple issues: * error.h's big comment was not updated for it. * Function contracts were not updated for it. * ERRP_GUARD() is unaware of @error_warn, and fails to mask it from error_prepend() and such. These crash on @error_warn, as pointed out by Akihiko Odaki. All fixable. However, after more than two years, we had just of 15 uses, of which the last few patches removed seven as unclean or otherwise undesirable, adding back five elsewhere. I didn't look closely enough at the remaining seven to decide whether they are desirable or not. I don't think this feature earns its keep. Drop it. Thanks-to: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20250923091000.3180122-14-armbru@redhat.com> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>	6 months ago
Markus Armbruster	5bd58f04b8	util/oslib-win32: Do not treat null @errp as &error_warn qemu_socket_select() and its wrapper qemu_socket_unselect() treat a null @errp as &error_warn. This is wildly inappropriate. A caller passing null @errp specifies that errors are to be ignored. If warnings are wanted, the caller must pass &error_warn. Change callers to do that, and drop the inappropriate treatment of null @errp. This assumes that warnings are wanted. I'm not familiar with the calling code, so I can't say whether it will work when the socket is invalid, or WSAEventSelect() fails. If it doesn't, then this should be an error instead of a warning. Invalid socket might even be a programming error. These warnings were introduced in commit `f5fd677ae7` (win32/socket: introduce qemu_socket_select() helper). I considered reverting to silence, but Daniel Berrangé asked for the warnings to be preserved. Cc: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20250923091000.3180122-9-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>	6 months ago
Vladimir Sementsov-Ogievskiy	6f607941b1	treewide: use qemu_set_blocking instead of g_unix_set_fd_nonblocking Instead of open-coded g_unix_set_fd_nonblocking() calls, use QEMU wrapper qemu_set_blocking(). Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> [DB: fix missing closing ) in tap-bsd.c, remove now unused GError var] Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	7 months ago
Vladimir Sementsov-Ogievskiy	d14c8cc69d	io/channel-socket: rework qio_channel_socket_copy_fds() We want to switch from qemu_socket_set_block() to newer qemu_set_blocking(), which provides return status of operation, to handle errors. Still, we want to keep qio_channel_socket_readv() interface clean, as currently it allocate @fds only on success. So, in case of error, we should close all incoming fds and keep user's @fds untouched or zero. Let's make separate functions qio_channel_handle_fds() and qio_channel_cleanup_fds(), to achieve what we want. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	7 months ago
Vladimir Sementsov-Ogievskiy	8cb17f9c36	util: drop qemu_socket_set_nonblock() Use common qemu_set_blocking() instead. Note that pre-patch the behavior of Win32 and Linux realizations are inconsistent: we ignore failure for Win32, and assert success for Linux. How do we convert the callers? 1. Most of callers call qemu_socket_set_nonblock() on a freshly created socket fd, in conditions when we may simply report an error. Seems correct switching to error handling both for Windows (pre-patch error is ignored) and Linux (pre-patch we assert success). Anyway, we normally don't expect errors in these cases. Still in tests let's use &error_abort for simplicity. What are exclusions? 2. hw/virtio/vhost-user.c - we are inside #ifdef CONFIG_LINUX, so no damage in switching to error handling from assertion. 3. io/channel-socket.c: here we convert both old calls to qemu_socket_set_nonblock() and qemu_socket_set_block() to one new call. Pre-patch we assert success for Linux in qemu_socket_set_nonblock(), and ignore all other errors here. So, for Windows switch is a bit dangerous: we may get new errors or crashes(when error_abort is passed) in cases where we have silently ignored the error before (was it correct in all such cases, if they were?) Still, there is no other way to stricter API than take this risk. 4. util/vhost-user-server - compiled only for Linux (see util/meson.build), so we are safe, switching from assertion to &error_abort. Note: In qga/channel-posix.c we use g_warning(), where g_printerr() would actually be a better choice. Still let's for now follow common style of qga, where g_warning() is commonly used to print such messages, and no call to g_printerr(). Converting everything to use g_printerr() should better be another series. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	7 months ago
Vladimir Sementsov-Ogievskiy	1ed8903916	treewide: handle result of qio_channel_set_blocking() Currently, we just always pass NULL as errp argument. That doesn't look good. Some realizations of interface may actually report errors. Channel-socket realization actually either ignore or crash on errors, but we are going to straighten it out to always reporting an errp in further commits. So, convert all callers to either handle the error (where environment allows) or explicitly use &error_abort. Take also a chance to change the return value to more convenient bool (keeping also in mind, that underlying realizations may return -1 on failure, not -errno). Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> [DB: fix return type mismatch in TLS/websocket channel impls for qio_channel_set_blocking] Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	7 months ago
Vladimir Sementsov-Ogievskiy	7bc2cbe330	migration/qemu-file: don't make incoming fds blocking again In migration we want to pass fd "as is", not changing its blocking status. The only current user of these fds is CPR state (through VMSTATE_FD), which of-course doesn't want to modify fds on target when source is still running and use these fds. Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	7 months ago
Daniel P. Berrangé	edea818371	io: add support for activating TLS thread safety workaround Add a QIO_CHANNEL_FEATURE_CONCURRENT_IO feature flag. If this is set on a QIOChannelTLS session object, the TLS session will be marked as requiring thread safety, which will activate the workaround for GNUTLS bug 1717 if needed. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/qemu-devel/20250718150514.2635338-3-berrange@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	9 months ago
Nir Soffer	e2e360db7a	io: Add helper for setting socket send buffer size Testing reading and writing from qemu-nbd using a unix domain socket shows that the platform default send buffer size is too low, leading to poor performance and hight cpu usage. Add a helper for setting socket send buffer size to be used in NBD code. It can also be used in other contexts. We don't need a helper for receive buffer size since it is not used with unix domain sockets. This is documented for Linux, and not documented for macOS. Failing to set the socket buffer size is not a fatal error, but the caller may want to warn about the failure. Signed-off-by: Nir Soffer <nirsof@gmail.com> Message-ID: <20250517201154.88456-2-nirsof@gmail.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	11 months ago
Juraj Marcin	0dc051aa85	io: Fix partial struct copy in qio_dns_resolver_lookup_sync_inet() Commit `aec21d3175` (qapi: Add InetSocketAddress member keep-alive) introduces the keep-alive flag, but this flag is not copied together with other options in qio_dns_resolver_lookup_sync_inet(). This patch fixes this issue and also prevents future ones by copying the entire structure first and only then overriding a few attributes that need to be different. Fixes: `aec21d3175` (qapi: Add InetSocketAddress member keep-alive) Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	10 months ago
Philippe Mathieu-Daudé	12d1a768bd	qom: Have class_init() take a const data argument Mechanical change using gsed, then style manually adapted to pass checkpatch.pl script. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250424194905.82506-4-philmd@linaro.org>	1 year ago
Fabiano Rosas	322d873b63	io: Add a read flag for relaxed EOF Add a read flag that can inform a channel that it's ok to receive an EOF at any moment. Channels that have some form of strict EOF tracking, such as TLS session termination, may choose to ignore EOF errors with the use of this flag. This is being added for compatibility with older migration streams that do not include a TLS termination step. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	1 year ago
Fabiano Rosas	a25b013019	io: Add flags argument to qio_channel_readv_full_all_eof We want to pass flags into qio_channel_tls_readv() but qio_channel_readv_full_all_eof() doesn't take a flags argument. No functional change. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	1 year ago
Fabiano Rosas	0b8a70d70f	crypto: Remove qcrypto_tls_session_get_handshake_status The correct way of calling qcrypto_tls_session_handshake() requires calling qcrypto_tls_session_get_handshake_status() right after it so there's no reason to have a separate method. Refactor qcrypto_tls_session_handshake() to inform the status in its own return value and alter the callers accordingly. No functional change. Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	1 year ago
Fabiano Rosas	30ee88622e	io: tls: Add qio_channel_tls_bye Add a task dispatcher for gnutls_bye similar to the qio_channel_tls_handshake_task(). The gnutls_bye() call might be interrupted and so it needs to be rescheduled. The migration code will make use of this to help the migration destination identify a premature EOF. Once the session termination is in place, any EOF that happens before the source issued gnutls_bye() will be considered an error. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	1 year ago
Markus Armbruster	ef834aa2b2	qapi/crypto: Rename QCryptoHashAlgorithm to *Algo, and drop prefix QAPI's 'prefix' feature can make the connection between enumeration type and its constants less than obvious. It's best used with restraint. QCryptoHashAlgorithm has a 'prefix' that overrides the generated enumeration constants' prefix to QCRYPTO_HASH_ALG. We could simply drop 'prefix', but then the prefix becomes QCRYPTO_HASH_ALGORITHM, which is rather long. We could additionally rename the type to QCryptoHashAlg, but I think the abbreviation "alg" is less than clear. Rename the type to QCryptoHashAlgo instead. The prefix becomes to QCRYPTO_HASH_ALGO. Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20240904111836.3273842-12-armbru@redhat.com> [Conflicts with merge commit `7bbadc60b5` resolved]	2 years ago
Daniel P. Berrangé	97f7bf113e	crypto: propagate errors from TLS session I/O callbacks GNUTLS doesn't know how to perform I/O on anything other than plain FDs, so the TLS session provides it with some I/O callbacks. The GNUTLS API design requires these callbacks to return a unix errno value, which means we're currently loosing the useful QEMU "Error" object. This changes the I/O callbacks in QEMU to stash the "Error" object in the QCryptoTLSSession class, and fetch it when seeing an I/O error returned from GNUTLS, thus preserving useful error messages. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2 years ago
Daniel P. Berrangé	57941c9c86	crypto: push error reporting into TLS session I/O APIs The current TLS session I/O APIs just return a synthetic errno value on error, which has been translated from a gnutls error value. This looses a large amount of valuable information that distinguishes different scenarios. Pushing population of the "Error *errp" object into the TLS session I/O APIs gives more detailed error information. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2 years ago
Anthony Harivel	95fa0c79a0	qio: add support for SO_PEERCRED for socket channel The function qio_channel_get_peercred() returns a pointer to the credentials of the peer process connected to this socket. This credentials structure is defined in <sys/socket.h> as follows: struct ucred { pid_t pid; /* Process ID of the sending process / uid_t uid; / User ID of the sending process / gid_t gid; / Group ID of the sending process */ }; The use of this function is possible only for connected AF_UNIX stream sockets and for AF_UNIX stream and datagram socket pairs. On platform other than Linux, the function return 0. Signed-off-by: Anthony Harivel <aharivel@redhat.com> Link: https://lore.kernel.org/r/20240522153453.1230389-2-aharivel@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2 years ago
Akihiko Odaki	7b1070a7e1	Revert "meson: Propagate gnutls dependency" This reverts commit `3eacf70bb5`. It was only needed because of duplicate objects caused by declare_dependency(link_whole: ...), and can be dropped now that meson.build specifies objects and dependencies separately for the internal dependencies. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-ID: <20240524-objects-v1-2-07cbbe96166b@daynix.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2 years ago
Fabiano Rosas	46cec74c1b	io: Stop using qemu_open_old in channel-file We want to make use of the Error object to report fdset errors from qemu_open_internal() and passing the error pointer to qemu_open_old() would require changing all callers. Move the file channel to the new API instead. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2 years ago
Eric Blake	199e84de1c	qio: Inherit follow_coroutine_ctx across TLS Since qemu 8.2, the combination of NBD + TLS + iothread crashes on an assertion failure: qemu-kvm: ../io/channel.c:534: void qio_channel_restart_read(void *): Assertion `qemu_get_current_aio_context() == qemu_coroutine_get_aio_context(co)' failed. It turns out that when we removed AioContext locking, we did so by having NBD tell its qio channels that it wanted to opt in to qio_channel_set_follow_coroutine_ctx(); but while we opted in on the main channel, we did not opt in on the TLS wrapper channel. qemu-iotests has coverage of NBD+iothread and NBD+TLS, but apparently no coverage of NBD+TLS+iothread, or we would have noticed this regression sooner. (I'll add that in the next patch) But while we could manually opt in to the TLS channel in nbd/server.c (a one-line change), it is more generic if all qio channels that wrap other channels inherit the follow status, in the same way that they inherit feature bits. CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Daniel P. Berrangé <berrange@redhat.com> CC: qemu-stable@nongnu.org Fixes: https://issues.redhat.com/browse/RHEL-34786 Fixes: `06e0f098` ("io: follow coroutine AioContext in qio_channel_yield()", v8.2.0) Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20240518025246.791593-5-eblake@redhat.com>	2 years ago
Fabiano Rosas	4760cedc61	io: Introduce qio_channel_file_new_dupfd Add a new helper function for creating a QIOChannelFile channel with a duplicated file descriptor. This saves the calling code from having to do error checking on the dup() call. Suggested-by: "Daniel P. Berrangé" <berrange@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Link: https://lore.kernel.org/r/20240311233335.17299-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2 years ago
Fabiano Rosas	61dec06082	migration/multifd: Don't fsync when closing QIOChannelFile Commit bc38feddeb ("io: fsync before closing a file channel") added a fsync/fdatasync at the closing point of the QIOChannelFile to ensure integrity of the migration stream in case of QEMU crash. The decision to do the sync at qio_channel_close() was not the best since that function runs in the main thread and the fsync can cause QEMU to hang for several minutes, depending on the migration size and disk speed. To fix the hang, remove the fsync from qio_channel_file_close(). At this moment, the migration code is the only user of the fsync and we're taking the tradeoff of not having a sync at all, leaving the responsibility to the upper layers. Fixes: bc38feddeb ("io: fsync before closing a file channel") Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240305195629.9922-1-farosas@suse.de Link: https://lore.kernel.org/r/20240305174332.2553-1-farosas@suse.de [peterx: add more comment to the qio_channel_close()] Signed-off-by: Peter Xu <peterx@redhat.com>	2 years ago
Fabiano Rosas	c05dfcb7f2	io: fsync before closing a file channel Make sure the data is flushed to disk before closing file channels. This is to ensure data is on disk and not lost in the event of a host crash. This is currently being implemented to affect the migration code when migrating to a file, but all QIOChannelFile users should benefit from the change. Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Acked-by: "Daniel P. Berrangé" <berrange@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-6-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2 years ago
Nikolay Borisov	0478b030fa	io: implement io_pwritev/preadv for QIOChannelFile The upcoming 'mapped-ram' feature will require qemu to write data to (and restore from) specific offsets of the migration file. Add a minimal implementation of pwritev/preadv and expose them via the io_pwritev and io_preadv interfaces. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-5-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2 years ago
Nikolay Borisov	f1cfe39418	io: Add generic pwritev/preadv interface Introduce basic pwritev/preadv support in the generic channel layer. Specific implementation will follow for the file channel as this is required in order to support migration streams with fixed location of each ram page. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-4-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2 years ago
Nikolay Borisov	401e311ff7	io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Add a generic QIOChannel feature SEEKABLE which would be used by the qemu_file* apis. For the time being this will be only implemented for file channels. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-3-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2 years ago
Daniel P. Berrangé	003f15369d	io: add trace event when cancelling TLS handshake Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2 years ago
Fabiano Rosas	9c636e0f96	io: Stop appending -listen to net listeners All callers of qio_net_listener_set_name() already add some sort of "listen" or "listener" suffix. For intance, we currently have "migration-socket-listener-listen" and "vnc-listen-listen" as ioc names. Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	3 years ago

1 2 3 4 5

235 Commits (1a1787ab6bbc3b329b81e14f25dfbc11e44ff2e0)