riscv/qemu - qemu - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Jon Kohler	ca61f91ef9	util/oslib-posix: increase memprealloc thread count to 32 Increase MAX_MEM_PREALLOC_THREAD_COUNT from 16 to 32. This was last touched in 2017 [1] and, since then, physical machine sizes and VMs therein have continue to get even bigger, both on average and on the extremes. For very large VMs, using 16 threads to preallocate memory can be a non-trivial bottleneck during VM start-up and migration. Increasing this limit to 32 threads reduces the time taken for these operations. Test results from quad socket Intel 8490H (4x 60 cores) show a fairly linear gain of 50% with the 2x thread count increase. --------------------------------------------- Idle Guest w/ 2M HugePages \| Start-up time --------------------------------------------- 240 vCPU, 7.5TB (16 threads) \| 2m41.955s --------------------------------------------- 240 vCPU, 7.5TB (32 threads) \| 1m19.404s --------------------------------------------- Note: Going above 32 threads appears to have diminishing returns at the point where the memory bandwidth and context switching costs appear to be a limiting factor to linear scaling. For posterity, on the same system as above: - 32 threads: 1m19s - 48 threads: 1m4s - 64 threads: 59s - 240 threads: 50s Additional thread counts also get less interesting as the amount of memory is to be preallocated is smaller. Putting that all together, 32 threads appears to be a sane number with a solid speedup on fairly modern hardware. To go faster, we'd either need to improve the hardware (CPU/memory) itself or improve clear_pages_*() on the kernel side to be more efficient. [1] `1e356fc14b` ("mem-prealloc: reduce large guest start-up and migration time.") Signed-off-by: Jon Kohler <jon@nutanix.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	5 months ago
Daniel P. Berrangé	13bedeb212	util: fix interleaving of error prefixes The vreport() function will optionally emit an prefix for error messages which is output to stderr incrementally. In the event that two vreport() calls execute concurrently, there is a risk that the prefix output will interleave. To address this it is required to take a lock on 'stderr' when outputting errors. Reported-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	6 months ago
Daniel P. Berrangé	6365e97c84	util: don't skip error prefixes when QMP is active The vreport() function will print to HMP if available, otherwise to stderr. In the event that vreport() is called during execution of a QMP command, it will print to stderr, but mistakenly omit the message prefixes (timestamp, guest name, program name). This new usage of monitor_is_cur_qmp() from vreport() requires that we add a stub to satisfy linking of non-system emulator binaries. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	7 months ago
Daniel P. Berrangé	2eb00abcfe	util: fix interleaving of error & trace output The monitor_cur_hmp() function will acquire/release mutex locks, which will trigger trace probes, which can in turn trigger qemu_log() calls. vreport() calls monitor_cur() multiple times through its execution both directly and indirectly via error_vprintf(). The result is that the prefix information printed by vreport() gets interleaved with qemu_log() output, when run outside the context of an HMP command dispatcher. This can be seen with: $ qemu-system-x86_64 \ -msg timestamp=on,guest-name=on \ -display none \ -object tls-creds-x509,id=f,dir=fish \ -name fish \ -d trace:qemu_mutex* 2025-09-10T16:30:42.514374Z qemu_mutex_unlock released mutex 0x560b0339b4c0 (/var/home/berrange/src/virt/qemu/include/qemu/lockable.h:56) 2025-09-10T16:30:42.514400Z qemu_mutex_lock waiting on mutex 0x560b033983e0 (/var/home/berrange/src/virt/qemu/include/qemu/lockable.h:56) 2025-09-10T16:30:42.514402Z qemu_mutex_locked taken mutex 0x560b033983e0 (/var/home/berrange/src/virt/qemu/include/qemu/lockable.h:56) 2025-09-10T16:30:42.514404Z qemu_mutex_unlock released mutex 0x560b033983e0 (/var/home/berrange/src/virt/qemu/include/qemu/lockable.h:56) 2025-09-10T16:30:42.516716Z qemu_mutex_lock waiting on mutex 0x560b03398560 (../monitor/monitor.c:91) 2025-09-10T16:30:42.516723Z qemu_mutex_locked taken mutex 0x560b03398560 (../monitor/monitor.c:91) 2025-09-10T16:30:42.516726Z qemu_mutex_unlock released mutex 0x560b03398560 (../monitor/monitor.c:96) 2025-09-10T16:30:42.516728Z qemu_mutex_lock waiting on mutex 0x560b03398560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842057Z qemu_mutex_locked taken mutex 0x564f5e401560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842058Z qemu_mutex_unlock released mutex 0x564f5e401560 (../monitor/monitor.c:96) 2025-09-10T16:31:04.842055Z 2025-09-10T16:31:04.842060Z qemu_mutex_lock waiting on mutex 0x564f5e401560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842061Z qemu_mutex_locked taken mutex 0x564f5e401560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842062Z qemu_mutex_unlock released mutex 0x564f5e401560 (../monitor/monitor.c:96) 2025-09-10T16:31:04.842064Z qemu_mutex_lock waiting on mutex 0x564f5e401560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842065Z qemu_mutex_locked taken mutex 0x564f5e401560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842066Z qemu_mutex_unlock released mutex 0x564f5e401560 (../monitor/monitor.c:96) fish 2025-09-10T16:31:04.842068Z qemu_mutex_lock waiting on mutex 0x564f5e401560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842069Z qemu_mutex_locked taken mutex 0x564f5e401560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842070Z qemu_mutex_unlock released mutex 0x564f5e401560 (../monitor/monitor.c:96) 2025-09-10T16:31:04.842072Z qemu_mutex_lock waiting on mutex 0x564f5e401560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842097Z qemu_mutex_locked taken mutex 0x564f5e401560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842099Z qemu_mutex_unlock released mutex 0x564f5e401560 (../monitor/monitor.c:96) qemu-system-x86_64:2025-09-10T16:31:04.842100Z qemu_mutex_lock waiting on mutex 0x564f5e401560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842102Z qemu_mutex_locked taken mutex 0x564f5e401560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842103Z qemu_mutex_unlock released mutex 0x564f5e401560 (../monitor/monitor.c:96) 2025-09-10T16:31:04.842105Z qemu_mutex_lock waiting on mutex 0x564f5e401560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842106Z qemu_mutex_locked taken mutex 0x564f5e401560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842107Z qemu_mutex_unlock released mutex 0x564f5e401560 (../monitor/monitor.c:96) Unable to access credentials fish/ca-cert.pem: No such file or directory2025-09-10T16:31:04.842109Z qemu_mutex_lock waiting on mutex 0x564f5e401560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842110Z qemu_mutex_locked taken mutex 0x564f5e401560 (../monitor/monitor.c:91) 2025-09-10T16:31:04.842111Z qemu_mutex_unlock released mutex 0x564f5e401560 (../monitor/monitor.c:96) To avoid this interleaving (as well as reduce the huge number of mutex lock/unlock calls) we need to ensure that monitor_cur_is_hmp() is only called once at the start of vreport(), and if no HMP is present, no further monitor APIs can be called. This implies error_[v]printf() cannot be called from vreport(). Instead we must introduce error_[v]printf_mon() which accept a pre-acquired Monitor object. In some cases, however, fprintf can be called directly as output will never be directed to the monitor. $ qemu-system-x86_64 \ -msg timestamp=on,guest-name=on \ -display none \ -object tls-creds-x509,id=f,dir=fish \ -name fish \ -d trace:qemu_mutex* 2025-09-10T16:31:22.701691Z qemu_mutex_unlock released mutex 0x5626fd3b84c0 (/var/home/berrange/src/virt/qemu/include/qemu/lockable.h:56) 2025-09-10T16:31:22.701728Z qemu_mutex_lock waiting on mutex 0x5626fd3b53e0 (/var/home/berrange/src/virt/qemu/include/qemu/lockable.h:56) 2025-09-10T16:31:22.701730Z qemu_mutex_locked taken mutex 0x5626fd3b53e0 (/var/home/berrange/src/virt/qemu/include/qemu/lockable.h:56) 2025-09-10T16:31:22.701732Z qemu_mutex_unlock released mutex 0x5626fd3b53e0 (/var/home/berrange/src/virt/qemu/include/qemu/lockable.h:56) 2025-09-10T16:31:22.703989Z qemu_mutex_lock waiting on mutex 0x5626fd3b5560 (../monitor/monitor.c:91) 2025-09-10T16:31:22.703996Z qemu_mutex_locked taken mutex 0x5626fd3b5560 (../monitor/monitor.c:91) 2025-09-10T16:31:22.703999Z qemu_mutex_unlock released mutex 0x5626fd3b5560 (../monitor/monitor.c:96) 2025-09-10T16:31:22.704000Z fish qemu-system-x86_64: Unable to access credentials fish/ca-cert.pem: No such file or directory Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	7 months ago
Daniel P. Berrangé	a582a5784e	monitor: move error_vprintf back to error-report.c The current unit tests rely on monitor.o not being linked, such that the monitor stubs get linked instead. Since error_vprintf is in monitor.o this allows a stub error_vprintf impl to be used that calls g_test_message. This takes a different approach, with error_vprintf moving back to error-report.c such that it is always linked into the tests. The monitor_vprintf() stub is then changed to use g_test_message if QTEST_SILENT_ERRORS is set, otherwise it will return -1 and trigger error_vprintf to call vfprintf. The end result is functionally equivalent for the purposes of the unit tests. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	6 months ago
Daniel P. Berrangé	03eb6f411f	util/log: add missing error reporting in qemu_log_trylock_with_err One codepath that could return NULL failed to populate the errp object. Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	3 months ago
Daniel P. Berrangé	338f63e5f0	util: avoid repeated prefix on incremental qemu_log calls There are three general patterns to QEMU log output 1. Single complete message calls qemu_log("Some message\n"); 2. Direct use of fprintf FILE f = qemu_log_trylock() fprintf(f, "..."); fprintf(f, "..."); fprintf(f, "...\n"); qemu_log_unlock(f) 3. Mixed use of qemu_log_trylock/qemu_log() FILE f = qemu_log_trylock() qemu_log("...."); qemu_log("...."); qemu_log("....\n"); qemu_log_unlock(f) When message prefixes are enabled, the timestamp will be unconditionally emitted for all qemu_log() calls. This works fine in the 1st case, and has no effect in the 2nd case. In the 3rd case, however, we get the timestamp printed over & over in each fragment. One can suggest that pattern (3) is pointless as it is functionally identical to (2) but with extra indirection and overhead. None the less we have a fair bit of code that does this. The qemu_log() call itself is nothing more than a wrapper which does pattern (2) with a single fprintf() call. One might question whether (2) should include the message prefix in the same way that (1), but there are scenarios where this could be inappropriate / unhelpful such as the CPU register dumps or linux-user strace output. This patch fixes the problem in pattern (3) by keeping track of the call depth of qemu_log_trylock() and then only emitting the the prefix when the starting depth was zero. In doing this qemu_log_trylock_context() is also introduced as a variant of qemu_log_trylock() that emits the prefix. Callers doing to batch output can thus choose whether a prefix is appropriate or not. Fixes: `012842c075` (log: make '-msg timestamp=on' apply to all qemu_log usage) Reported-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	7 months ago
Daniel P. Berrangé	215235d365	util: add API to fetch the current thread name This will be used to include the thread name in error reports in a later patch. It returns a const string stored in a thread local to avoid memory allocation when it is called repeatedly in a single thread. The thread name should be set at the very start of the thread execution, which is the case when using qemu_thread_create. This uses the official thread APIs for fetching thread names, so that it captures names of threads spawned by code in 3rd party libraries, not merely QEMU spawned thrads. This also addresses the gap from the previous patch for setting the name of the main thread. A constructor is used to initialize the 'namebuf' thread-local in the main thread only. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	7 months ago
Daniel P. Berrangé	d74938d14c	util: set the name for the 'main' thread on Windows The default main thread name is undefined, so use a constructor to explicitly set it to 'main'. This constructor is marked to run early as the thread name is intended to be used in error reporting / logs which may be triggered very early in QEMU execution. This is only done on Windows platforms, because on Linux (and possibly other POSIX platforms) changing the main thread name has a side effect of changing the process name reported by tools like 'ps' which fetch from the file /proc/self/task/tid/comm, expecting it to be the binary name. The subsequent patch will address POSIX platforms in a different way. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	8 months ago
Daniel P. Berrangé	46255cc2be	util: expose qemu_thread_set_name The ability to set the thread name needs to be used in a number of places, so expose the current impls as public methods. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	7 months ago
Daniel P. Berrangé	71d81b320d	util: fix race setting thread name on Win32 The call to set the thread name on Win32 platforms is done by the parent thread, after _beginthreadex() returns. At this point the new child thread is potentially already executing its start method. To ensure the thread name is guaranteed to be set before any "interesting" code starts executing, it must be done in the start method of the child thread itself. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2 months ago
Daniel P. Berrangé	1b65aeed2a	system: unconditionally enable thread naming When thread naming was introduced years ago, it was disabled by default and put behind a command line flag: commit `8f480de0c9` Author: Dr. David Alan Gilbert <dgilbert@redhat.com> Date: Thu Jan 30 10:20:31 2014 +0000 Add 'debug-threads' suboption to --name This was done based on a concern that something might depend on the historical thread naming. Thread names, however, were never promised to be part of QEMU's public API. The defaults will vary across platforms, so no assumptions should ever be made about naming. An opt-in behaviour is also unfortunately incompatible with RCU which creates its thread from an constructor function which is run before command line args are parsed. Thus the RCU thread lacks any name. libvirt has unconditionally enabled debug-threads=yes on all VMs it creates for 10 years. Interestingly this DID expose a bug in libvirt, as it parsed /proc/$PID/stat and could not cope with a space in the thread name. This was a latent pre-existing bug in libvirt though, and not a part of QEMU's API. Having thread names always available, will allow thread names to be included in error reports and log messags QEMU prints by default, which will improve ability to triage QEMU bugs. Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	8 months ago
Warner Losh	bba2f724f1	freebsd: FreeBSD 15 has native inotify Check to make sure that we have inotify in libc, before looking for it in libinotify. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Marc-André Lureau <marcandre.lureau@redhat.com> Cc: Daniel P. Berrange <berrange@redhat.com> Cc: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Warner Losh <imp@bsdimp.com>	2 months ago
Akihiko Odaki	649a78aa32	Reapply "rcu: Unify force quiescent state" This reverts commit `ddb4d9d174`. The commit says: > This reverts commit `55d98e3ede`. > > The commit introduced a regression in the replay functional test > on alpha (tests/functional/alpha/test_replay.py), that causes CI > failures regularly. Thus revert this change until someone has > figured out what is going wrong here. Reapply the change as alpha is fixed. Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20260217-alpha-v1-2-0dcc708c9db3@rsg.ci.i.u-tokyo.ac.jp Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	1 month ago
Jens Axboe	961fcc0f22	fdmon-io_uring: check CQ ring directly in gsource_check gsource_check() only looks at the ppoll revents for the io_uring fd, but CQEs can be posted during gsource_prepare()'s io_uring_submit() call via kernel task_work processing on syscall exit. These completions are already sitting in the CQ ring but the ring fd may not be signaled yet, causing gsource_check() to return false. Add a fallback io_uring_cq_ready() check so completions that arrive during submission are dispatched immediately rather than waiting for the next ppoll() cycle. Signed-off-by: Jens Axboe <axboe@kernel.dk> Message-ID: <20260213143225.161043-3-axboe@kernel.dk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2 months ago
Jens Axboe	2ae361ef1d	aio-posix: notify main loop when SQEs are queued When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the block I/O coroutine inline on the vCPU thread because qemu_get_current_aio_context() returns the main AioContext when BQL is held. The coroutine calls luring_co_submit() which queues an SQE via fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens in gsource_prepare() on the main loop thread. Since the coroutine ran inline (not via aio_co_schedule()), no BH is scheduled and aio_notify() is never called. The main loop remains asleep in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until the next timer fires. Fix this by calling aio_notify() after queuing the SQE. This wakes the main loop via the eventfd so it can run gsource_prepare() and submit the pending SQE promptly. This is a generic fix that benefits all devices using aio=io_uring. Without it, AHCI/SATA devices see MUCH worse I/O latency since they use MMIO (not ioeventfd like virtio) and have no other mechanism to wake the main loop after queuing block I/O. This is usually a bit hard to detect, as it also relies on the ppoll loop not waking up for other activity, and micro benchmarks tend not to see it because they don't have any real processing time. With a synthetic test case that has a few usleep() to simulate processing of read data, it's very noticeable. The below example reads 128MB with O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before each batch submit, and a 1ms delay after processing each completion. Running it on /dev/sda yields: time sudo ./iotest /dev/sda ________________________________________________________ Executed in 25.76 secs fish external usr time 6.19 millis 783.00 micros 5.41 millis sys time 12.43 millis 642.00 micros 11.79 millis while on a virtio-blk or NVMe device we get: time sudo ./iotest /dev/vdb ________________________________________________________ Executed in 1.25 secs fish external usr time 1.40 millis 0.30 millis 1.10 millis sys time 17.61 millis 1.43 millis 16.18 millis time sudo ./iotest /dev/nvme0n1 ________________________________________________________ Executed in 1.26 secs fish external usr time 6.11 millis 0.52 millis 5.59 millis sys time 13.94 millis 1.50 millis 12.43 millis where the latter are consistent. If we run the same test but keep the socket for the ssh connection active by having activity there, then the sda test looks as follows: time sudo ./iotest /dev/sda ________________________________________________________ Executed in 1.23 secs fish external usr time 2.70 millis 39.00 micros 2.66 millis sys time 4.97 millis 977.00 micros 3.99 millis as now the ppoll loop is woken all the time anyway. After this fix, on an idle system: time sudo ./iotest /dev/sda ________________________________________________________ Executed in 1.30 secs fish external usr time 2.14 millis 0.14 millis 2.00 millis sys time 16.93 millis 1.16 millis 15.76 millis Signed-off-by: Jens Axboe <axboe@kernel.dk> Message-Id: <07d701b9-3039-4f9b-99a2-abeae51146a5@kernel.dk> Reviewed-by: Kevin Wolf <kwolf@redhat.com> [Generalize the comment since this applies to all vCPU thread activity, not just coroutines, as suggested by Kevin Wolf <kwolf@redhat.com>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	1 month ago
Marc-André Lureau	ba63a9643a	util: add some extra stubs for qemu modules initialization Avoid extra ifdef-ery when optionally supporting modules, as done in audio-test (and vl.c). Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>	2 months ago
Vladimir Sementsov-Ogievskiy	2eed5472ec	error-report: make real_time_iso8601() public To be reused in the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-ID: <20260201173633.413934-3-vsementsov@yandex-team.ru>	2 months ago
Thomas Huth	ddb4d9d174	Revert "rcu: Unify force quiescent state" This reverts commit `55d98e3ede`. The commit introduced a regression in the replay functional test on alpha (tests/functional/alpha/test_replay.py), that causes CI failures regularly. Thus revert this change until someone has figured out what is going wrong here. Buglink: https://gitlab.com/qemu-project/qemu/-/issues/3197 Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-ID: <20260209120336.41454-1-thuth@redhat.com>	2 months ago
Vladimir Sementsov-Ogievskiy	7404d6852d	tests/unit: add unit test for qemu_hexdump() Test that the fix in commit `20aa05edc2` ("util/hexdump: fix QEMU_HEXDUMP_LINE_WIDTH logic") make sense. To not break compilation when we build without 'block', move hexdump.c out of "if have_block" in meson.build. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20260202112826.38018-1-philmd@linaro.org>	5 months ago
Peter Maydell	dc249aaf57	misc: Clean up includes This commit deals with various .c files that included system headers that are already pulled in by osdep.h, where the .c file includes osdep.h already itself. This commit was created with scripts/clean-includes: ./scripts/clean-includes '--git' 'misc' 'hw/core' 'semihosting' 'target/arm' 'target/i386/kvm/kvm.c' 'target/loongarch' 'target/riscv' 'tools' 'util' All .c should include qemu/osdep.h first. The script performs three related cleanups: * Ensure .c files include qemu/osdep.h first. * Including it in a .h is redundant, since the .c already includes it. Drop such inclusions. * Likewise, including headers qemu/osdep.h includes is redundant. Drop these, too. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20260116125830.926296-4-peter.maydell@linaro.org	3 months ago
Philippe Mathieu-Daudé	e44dc42f3a	bswap: Use 'qemu/bswap.h' instead of 'qemu/host-utils.h' These files only require "qemu/bswap.h", not "qemu/host-utils.h". Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20260109163730.57087-2-philmd@linaro.org>	3 months ago
Philippe Mathieu-Daudé	3115691855	bswap: Include missing 'qemu/bswap.h' header All these files indirectly include the "qemu/bswap.h" header. Make this inclusion explicit to avoid build errors when refactoring unrelated headers. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20260109164742.58041-4-philmd@linaro.org>	3 months ago
Michael Tokarev	a2b429b114	Revert "gdbstub: Try unlinking the unix socket before binding" This reverts commit `fccb744f41`. This commit introduced dependency of linux-user on qemu-sockets.c. The latter includes handling of various socket types, while gdbstub only needs unix sockets. Including different kinds of sockets makes it more problematic to build linux-user statically. The original issue - the need to unlink unix socket before binding - will be addressed in the next change. Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	3 months ago
Richard Henderson	239b9d0488	include/qemu/atomic: Drop aligned_{u}int64_t As we no longer support i386 as a host architecture, this abstraction is no longer required. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	3 months ago
Richard Henderson	71adccb6f7	include/qemu/atomic: Drop qatomic_{read,set}_[iu]64 Replace all uses with the normal qatomic_{read,set}. Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	3 months ago
Richard Henderson	90e2e8ada7	util: Remove stats64 This API is no longer used. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	3 months ago
Richard Henderson	25512d6865	*: Remove __i386__ tests Remove instances of __i386__, except from tests and imported headers. Drop a block containing sanity check and fprintf error message for i386-on-i386 or x86_64-on-x86_64 emulation. If we really want something like this, we would do it via some form of compile-time check. Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	4 months ago
Farhan Ali	0ffc8f3625	util/vfio-helper: Fix endianness in PCI config read/write functions The VFIO pread/pwrite functions use little-endian data format. Currently, the qemu_vfio_pci_read_config() and qemu_vfio_pci_write_config() don't correctly convert from CPU native endian format to little-endian (and vice versa) when using the pread/pwrite functions. Fix this by limiting read/write to 32 bits and handling endian conversion in qemu_vfio_pci_read_config() and qemu_vfio_pci_write_config(). Signed-off-by: Farhan Ali <alifm@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20260105222029.2423-1-alifm@linux.ibm.com [ clg: Fixed typo in subject ] Signed-off-by: Cédric Le Goater <clg@redhat.com>	3 months ago
Ilya Leoshkevich	f098c32db4	target/s390x: Fix infinite loop during replay Replaying even trivial s390x kernels hangs, because: - cpu_post_load() fires the TOD timer immediately. - s390_tod_load() schedules work for firing the TOD timer. - If rr loop sees work and then timer, we get one timer expiration. - If rr loop sees timer and then work, we get two timer expirations. - Record and replay may diverge due to this race. - In this particular case divergence makes replay loop spin: it sees that TOD timer has expired, but cannot invoke its callback, because there is no recorded CHECKPOINT_CLOCK_VIRTUAL. - The order in which rr loop sees work and timer depends on whether and when rr loop wakes up during load_snapshot(). - rr loop may wake up after the main thread kicks the CPU and drops the BQL, which may happen if it calls, e.g., qemu_cond_wait_bql(). Firing TOD timer twice is duplicate work, but it was introduced intentionally in commit `7c12f710ba` ("s390x/tcg: rearm the CKC timer during migration") in order to avoid dependency on migration order. The key culprits here are timers that are armed ready expired. They break the ordering between timers and CPU work, because they are not constrained by instruction execution, thus introducing non-determinism and record-replay divergence. Fix by converting such timer callbacks to CPU work. Also add TOD clock updates to the save path, mirroring the load path, in order to have the same CHECKPOINT_CLOCK_VIRTUAL during recording and replaying. Link: https://lore.kernel.org/qemu-devel/20251128133949.181828-1-thuth@redhat.com/ Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Tested-by: Thomas Huth <thuth@redhat.com> Message-ID: <20251201215514.1751994-1-iii@linux.ibm.com> [thuth: Add SPDX license identifiers to the new stubs files] Signed-off-by: Thomas Huth <thuth@redhat.com>	4 months ago
Markus Armbruster	d84870f2b8	error: Use error_setg_file_open() for simplicity and consistency Replace error_setg_errno(errp, errno, MSG, FNAME); by error_setg_file_open(errp, errno, FNAME); where MSG is "Could not open '%s'" or similar. Also replace equivalent uses of error_setg(). A few messages lose prefixes ("net dump: ", "SEV: ", __func__ ": "). We could put them back with error_prepend(). Not worth the bother. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org> Message-ID: <20251121121438.1249498-11-armbru@redhat.com> [Conflict with commit `26b4a6ffe7` (monitor/hmp: Merge hmp-cmds-target.c within hmp-cmds.c) resolved]	4 months ago
Nguyen Dinh Phi	0c9f429ec8	util: Move qemu_ftruncate64 from block/file-win32.c to oslib-win32.c qemu_ftruncate64() is a general-purpose utility function that may be used outside of the block layer. Move it to util/oslib-win32.c where other Windows-specific utility functions reside. Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-ID: <20251218085446.462827-3-phind.uet@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	4 months ago
Paolo Bonzini	12e50722e4	block: rename block/aio-wait.h to qemu/aio-wait.h AIO_WAIT_WHILE is used even outside the block layer; move the header file out of block/ just like the implementation is in util/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	4 months ago
Paolo Bonzini	ba773aded3	block: rename block/aio.h to qemu/aio.h AioContexts are used as a generic event loop even outside the block layer; move the header file out of block/ just like the implementation is in util/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	4 months ago
Paolo Bonzini	238449947d	block: reduce files included by block/aio.h Avoid including all of qdev everywhere (the hw/core/qdev.h header in fact brings in a lot more headers too), instead declare a couple structs for which only a pointer type is needed. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	4 months ago
Paolo Bonzini	ddab0ef124	block: extract include/qemu/aiocb.h out of include/block/aio.h Create a new header corresponding to functions defined in util/aiocb.c, and include it whenever AIOCBs are used but AioContext is not. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	4 months ago
Marc Morcos	e77508292c	thread-pool: Fix thread race Fix a data race occurred between `worker_thread()` writing and `thread_pool_completion_bh()` reading shared data in `util/thread-pool.c`. Signed-off-by: Marc Morcos <marcmorcos@google.com> Link: https://lore.kernel.org/r/20251213001443.2041258-3-marcmorcos@google.com [Use qatomic_set for writes to ret->ret. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	4 months ago
Paolo Bonzini	7f548b8f23	include: reorganize memory API headers Move RAMBlock functions out of ram_addr.h and cpu-common.h; move memory API headers out of include/exec and into include/system. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	4 months ago
Cédric Le Goater	326e620fc0	Fix const qualifier build errors with recent glibc A recent change in glibc 2.42.9000 [1] changes the return type of strstr() and other string functions to be 'const char ' when the input is a 'const char '. This breaks the build in various files with errors such as : error: initialization discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] 208 \| char pidstr = strstr(filename, "%"); \| ^~~~~~ Fix this by changing the type of the variables that store the result of these functions to 'const char '. [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=cd748a63ab1a7ae846175c532a3daab341c62690 Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20251209174328.698774-1-clg@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	4 months ago
Stefan Hajnoczi	047dabef97	block/io_uring: use aio_add_sqe() AioContext has its own io_uring instance for file descriptor monitoring. The disk I/O io_uring code was developed separately. Originally I thought the characteristics of file descriptor monitoring and disk I/O were too different, requiring separate io_uring instances. Now it has become clear to me that it's feasible to share a single io_uring instance for file descriptor monitoring and disk I/O. We're not using io_uring's IOPOLL feature or anything else that would require a separate instance. Unify block/io_uring.c and util/fdmon-io_uring.c using the new aio_add_sqe() API that allows user-defined io_uring sqe submission. Now block/io_uring.c just needs to submit readv/writev/fsync and most of the io_uring-specific logic is handled by fdmon-io_uring.c. There are two immediate advantages: 1. Fewer system calls. There is no need to monitor the disk I/O io_uring ring fd from the file descriptor monitoring io_uring instance. Disk I/O completions are now picked up directly. Also, sqes are accumulated in the sq ring until the end of the event loop iteration and there are fewer io_uring_enter(2) syscalls. 2. Less code duplication. Note that error_setg() messages are not supposed to end with punctuation, so I removed a '.' for the non-io_uring build error message. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20251104022933.618123-15-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	1eebdab3c3	aio-posix: add aio_add_sqe() API for user-defined io_uring requests Introduce the aio_add_sqe() API for submitting io_uring requests in the current AioContext. This allows other components in QEMU, like the block layer, to take advantage of io_uring features without creating their own io_uring context. This API supports nested event loops just like file descriptor monitoring and BHs do. This comes at a complexity cost: CQE callbacks must be placed on a list so that nested event loops can invoke pending CQE callbacks from parent event loops. If you're wondering why CqeHandler exists instead of just a callback function pointer, this is why. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20251104022933.618123-14-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	87e7a0f423	aio-posix: add fdmon_ops->dispatch() The ppoll and epoll file descriptor monitoring implementations rely on the event loop's generic file descriptor, timer, and BH dispatch code to invoke user callbacks. The io_uring file descriptor monitoring implementation will need io_uring-specific dispatch logic for CQE handlers for custom SQEs. Introduce a new FDMonOps ->dispatch() callback that allows file descriptor monitoring implementations to invoke user callbacks. The next patch will use this new callback. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20251104022933.618123-13-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	a63e41f2a4	aio-posix: unindent fdmon_io_uring_destroy() Reduce the level of indentation to make further code changes easier to read. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20251104022933.618123-12-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	59202c98c0	aio-posix: gracefully handle io_uring_queue_init() failure io_uring may not be available at runtime due to system policies (e.g. the io_uring_disabled sysctl) or creation could fail due to file descriptor resource limits. Handle failure scenarios as follows: If another AioContext already has io_uring, then fail AioContext creation so that the aio_add_sqe() API is available uniformly from all QEMU threads. Otherwise fall back to epoll(7) if io_uring is unavailable. Notes: - Update the comment about selecting the fastest fdmon implementation. At this point it's not about speed anymore, it's about aio_add_sqe() API availability. - Uppercase the error message when converting from error_report() to error_setg_errno() for consistency (but there are instances of lowercase in the codebase). - It's easier to move the #ifdefs from aio-posix.h to aio-posix.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20251104022933.618123-11-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	421dcc8023	aio: add errp argument to aio_context_setup() When aio_context_new() -> aio_context_setup() fails at startup it doesn't really matter whether errors are returned to the caller or the process terminates immediately. However, it is not acceptable to terminate when hotplugging --object iothread at runtime. Refactor aio_context_setup() so that errors can be propagated. The next commit will set errp when fdmon_io_uring_setup() fails. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20251104022933.618123-10-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	3769b9abe9	aio: free AioContext when aio_context_new() fails g_source_destroy() only removes the GSource from the GMainContext it's attached to, if any. It does not free it. Use g_source_unref() instead so that the AioContext (which embeds a GSource) is freed. There is no need to call g_source_destroy() in aio_context_new() because the GSource isn't attached to a GMainContext yet. aio_ctx_finalize() expects everything to be set up already, so introduce the new ctx->initialized boolean and do nothing when called with !initialized. This also requires moving aio_context_setup() down after event_notifier_init() since aio_ctx_finalize() won't release any resources that aio_context_setup() acquired. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20251104022933.618123-9-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	d1f42b600a	aio: remove aio_context_use_g_source() There is no need for aio_context_use_g_source() now that epoll(7) and io_uring(7) file descriptor monitoring works with the glib event loop. AioContext doesn't need to be notified that GSource is being used. On hosts with io_uring support this now enables fdmon-io_uring.c by default, replacing fdmon-poll.c and fdmon-epoll.c. In other words, the event loop will use io_uring! Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20251104022933.618123-8-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	ded29e64c6	aio-posix: integrate fdmon into glib event loop AioContext's glib integration only supports ppoll(2) file descriptor monitoring. epoll(7) and io_uring(7) disable themselves and switch back to ppoll(2) when the glib event loop is used. The main loop thread cannot use epoll(7) or io_uring(7) because it always uses the glib event loop. Future QEMU features may require io_uring(7). One example is uring_cmd support in FUSE exports. Each feature could create its own io_uring(7) context and integrate it into the event loop, but this is inefficient due to extra syscalls. It would be more efficient to reuse the AioContext's existing fdmon-io_uring.c io_uring(7) context because fdmon-io_uring.c will already be active on systems where Linux io_uring is available. In order to keep fdmon-io_uring.c's AioContext operational even when the glib event loop is used, extend FDMonOps with an API similar to GSourceFuncs so that file descriptor monitoring can integrate into the glib event loop. A quick summary of the GSourceFuncs API: - prepare() is called each event loop iteration before waiting for file descriptors and timers. - check() is called to determine whether events are ready to be dispatched after waiting. - dispatch() is called to process events. More details here: https://docs.gtk.org/glib/struct.SourceFuncs.html Move the ppoll(2)-specific code from aio-posix.c into fdmon-poll.c and also implement epoll(7)- and io_uring(7)-specific file descriptor monitoring code for glib event loops. Note that it's still faster to use aio_poll() rather than the glib event loop since glib waits for file descriptor activity with ppoll(2) and does not support adaptive polling. But at least epoll(7) and io_uring(7) now work in glib event loops. Splitting this into multiple commits without temporarily breaking AioContext proved difficult so this commit makes all the changes. The next commit will remove the aio_context_use_g_source() API because it is no longer needed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20251104022933.618123-7-stefanha@redhat.com> [kwolf: Build fixes; fix AioContext.list_lock use after destroy] Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	511c62a2c6	aio-posix: keep polling enabled with fdmon-io_uring.c Commit `816a430c51` ("util/aio: Defer disabling poll mode as long as possible") kept polling enabled when the event loop timeout is 0. Since there is no timeout the event loop will continue immediately and the overhead of disabling and re-enabling polling can be avoided. fdmon-io_uring.c is unable to take advantage of this optimization because its ->need_wait() function returns true whenever there are new io_uring SQEs to submit: if (timeout \|\| ctx->fdmon_ops->need_wait(ctx)) { ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Polling will be disabled even when timeout == 0. Extend the optimization to handle the case when need_wait() returns true and timeout == 0. Cc: Chao Gao <chao.gao@intel.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20251104022933.618123-5-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	5f8741fca5	aio-posix: fix spurious return from ->wait() due to signals io_uring_enter(2) only returns -EINTR in some cases when interrupted by a signal. Therefore the while loop in fdmon_io_uring_wait() is incomplete and can lead to a spurious early return. Handle the case when a signal interrupts io_uring_enter(2) but the syscall returns the number of SQEs submitted (that takes priority over -EINTR). This patch probably makes little difference for QEMU, but the test suite relies on the exact pattern of aio_poll() return values, so it's best to hide this io_uring syscall interface quirk. Here is the strace of test-aio receiving 3 SIGCONT signals after this fix has been applied. Notice how the io_uring_enter(2) return value is 1 the first time because an SQE was submitted, but -EINTR the other times: eventfd2(0, EFD_CLOEXEC\|EFD_NONBLOCK) = 9 io_uring_enter(7, 1, 0, 0, NULL, 8) = 1 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe38a46240) = 0 io_uring_enter(7, 1, 1, IORING_ENTER_GETEVENTS, NULL, 8) = 1 --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=596096, si_uid=1000} --- io_uring_enter(7, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8) = -1 EINTR (Interrupted system call) --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=596096, si_uid=1000} --- io_uring_enter(7, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8 <unfinished ...> <... io_uring_enter resumed>) = -1 EINTR (Interrupted system call) --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=596096, si_uid=1000} --- io_uring_enter(7, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8 <unfinished ...> <... io_uring_enter resumed>) = 0 Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20251104022933.618123-4-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago

1 2 3 4 5 ...

2079 Commits (ca61f91ef9b0d10333881fd0070303ea33cbc72e)