riscv/qemu - qemu - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Daniel P. Berrangé	46255cc2be	util: expose qemu_thread_set_name The ability to set the thread name needs to be used in a number of places, so expose the current impls as public methods. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	7 months ago
Daniel P. Berrangé	71d81b320d	util: fix race setting thread name on Win32 The call to set the thread name on Win32 platforms is done by the parent thread, after _beginthreadex() returns. At this point the new child thread is potentially already executing its start method. To ensure the thread name is guaranteed to be set before any "interesting" code starts executing, it must be done in the start method of the child thread itself. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2 months ago
Daniel P. Berrangé	1b65aeed2a	system: unconditionally enable thread naming When thread naming was introduced years ago, it was disabled by default and put behind a command line flag: commit `8f480de0c9` Author: Dr. David Alan Gilbert <dgilbert@redhat.com> Date: Thu Jan 30 10:20:31 2014 +0000 Add 'debug-threads' suboption to --name This was done based on a concern that something might depend on the historical thread naming. Thread names, however, were never promised to be part of QEMU's public API. The defaults will vary across platforms, so no assumptions should ever be made about naming. An opt-in behaviour is also unfortunately incompatible with RCU which creates its thread from an constructor function which is run before command line args are parsed. Thus the RCU thread lacks any name. libvirt has unconditionally enabled debug-threads=yes on all VMs it creates for 10 years. Interestingly this DID expose a bug in libvirt, as it parsed /proc/$PID/stat and could not cope with a space in the thread name. This was a latent pre-existing bug in libvirt though, and not a part of QEMU's API. Having thread names always available, will allow thread names to be included in error reports and log messags QEMU prints by default, which will improve ability to triage QEMU bugs. Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	8 months ago
Warner Losh	bba2f724f1	freebsd: FreeBSD 15 has native inotify Check to make sure that we have inotify in libc, before looking for it in libinotify. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Marc-André Lureau <marcandre.lureau@redhat.com> Cc: Daniel P. Berrange <berrange@redhat.com> Cc: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Warner Losh <imp@bsdimp.com>	2 months ago
Akihiko Odaki	649a78aa32	Reapply "rcu: Unify force quiescent state" This reverts commit `ddb4d9d174`. The commit says: > This reverts commit `55d98e3ede`. > > The commit introduced a regression in the replay functional test > on alpha (tests/functional/alpha/test_replay.py), that causes CI > failures regularly. Thus revert this change until someone has > figured out what is going wrong here. Reapply the change as alpha is fixed. Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20260217-alpha-v1-2-0dcc708c9db3@rsg.ci.i.u-tokyo.ac.jp Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2 months ago
Jens Axboe	961fcc0f22	fdmon-io_uring: check CQ ring directly in gsource_check gsource_check() only looks at the ppoll revents for the io_uring fd, but CQEs can be posted during gsource_prepare()'s io_uring_submit() call via kernel task_work processing on syscall exit. These completions are already sitting in the CQ ring but the ring fd may not be signaled yet, causing gsource_check() to return false. Add a fallback io_uring_cq_ready() check so completions that arrive during submission are dispatched immediately rather than waiting for the next ppoll() cycle. Signed-off-by: Jens Axboe <axboe@kernel.dk> Message-ID: <20260213143225.161043-3-axboe@kernel.dk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2 months ago
Jens Axboe	2ae361ef1d	aio-posix: notify main loop when SQEs are queued When a vCPU thread handles MMIO (holding BQL), aio_co_enter() runs the block I/O coroutine inline on the vCPU thread because qemu_get_current_aio_context() returns the main AioContext when BQL is held. The coroutine calls luring_co_submit() which queues an SQE via fdmon_io_uring_add_sqe(), but the actual io_uring_submit() only happens in gsource_prepare() on the main loop thread. Since the coroutine ran inline (not via aio_co_schedule()), no BH is scheduled and aio_notify() is never called. The main loop remains asleep in ppoll() with up to a 499ms timeout, leaving the SQE unsubmitted until the next timer fires. Fix this by calling aio_notify() after queuing the SQE. This wakes the main loop via the eventfd so it can run gsource_prepare() and submit the pending SQE promptly. This is a generic fix that benefits all devices using aio=io_uring. Without it, AHCI/SATA devices see MUCH worse I/O latency since they use MMIO (not ioeventfd like virtio) and have no other mechanism to wake the main loop after queuing block I/O. This is usually a bit hard to detect, as it also relies on the ppoll loop not waking up for other activity, and micro benchmarks tend not to see it because they don't have any real processing time. With a synthetic test case that has a few usleep() to simulate processing of read data, it's very noticeable. The below example reads 128MB with O_DIRECT in 128KB chunks in batches of 16, and has a 1ms delay before each batch submit, and a 1ms delay after processing each completion. Running it on /dev/sda yields: time sudo ./iotest /dev/sda ________________________________________________________ Executed in 25.76 secs fish external usr time 6.19 millis 783.00 micros 5.41 millis sys time 12.43 millis 642.00 micros 11.79 millis while on a virtio-blk or NVMe device we get: time sudo ./iotest /dev/vdb ________________________________________________________ Executed in 1.25 secs fish external usr time 1.40 millis 0.30 millis 1.10 millis sys time 17.61 millis 1.43 millis 16.18 millis time sudo ./iotest /dev/nvme0n1 ________________________________________________________ Executed in 1.26 secs fish external usr time 6.11 millis 0.52 millis 5.59 millis sys time 13.94 millis 1.50 millis 12.43 millis where the latter are consistent. If we run the same test but keep the socket for the ssh connection active by having activity there, then the sda test looks as follows: time sudo ./iotest /dev/sda ________________________________________________________ Executed in 1.23 secs fish external usr time 2.70 millis 39.00 micros 2.66 millis sys time 4.97 millis 977.00 micros 3.99 millis as now the ppoll loop is woken all the time anyway. After this fix, on an idle system: time sudo ./iotest /dev/sda ________________________________________________________ Executed in 1.30 secs fish external usr time 2.14 millis 0.14 millis 2.00 millis sys time 16.93 millis 1.16 millis 15.76 millis Signed-off-by: Jens Axboe <axboe@kernel.dk> Message-Id: <07d701b9-3039-4f9b-99a2-abeae51146a5@kernel.dk> Reviewed-by: Kevin Wolf <kwolf@redhat.com> [Generalize the comment since this applies to all vCPU thread activity, not just coroutines, as suggested by Kevin Wolf <kwolf@redhat.com>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2 months ago
Marc-André Lureau	ba63a9643a	util: add some extra stubs for qemu modules initialization Avoid extra ifdef-ery when optionally supporting modules, as done in audio-test (and vl.c). Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>	2 months ago
Vladimir Sementsov-Ogievskiy	2eed5472ec	error-report: make real_time_iso8601() public To be reused in the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-ID: <20260201173633.413934-3-vsementsov@yandex-team.ru>	2 months ago
Thomas Huth	ddb4d9d174	Revert "rcu: Unify force quiescent state" This reverts commit `55d98e3ede`. The commit introduced a regression in the replay functional test on alpha (tests/functional/alpha/test_replay.py), that causes CI failures regularly. Thus revert this change until someone has figured out what is going wrong here. Buglink: https://gitlab.com/qemu-project/qemu/-/issues/3197 Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-ID: <20260209120336.41454-1-thuth@redhat.com>	2 months ago
Vladimir Sementsov-Ogievskiy	7404d6852d	tests/unit: add unit test for qemu_hexdump() Test that the fix in commit `20aa05edc2` ("util/hexdump: fix QEMU_HEXDUMP_LINE_WIDTH logic") make sense. To not break compilation when we build without 'block', move hexdump.c out of "if have_block" in meson.build. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20260202112826.38018-1-philmd@linaro.org>	5 months ago
Peter Maydell	dc249aaf57	misc: Clean up includes This commit deals with various .c files that included system headers that are already pulled in by osdep.h, where the .c file includes osdep.h already itself. This commit was created with scripts/clean-includes: ./scripts/clean-includes '--git' 'misc' 'hw/core' 'semihosting' 'target/arm' 'target/i386/kvm/kvm.c' 'target/loongarch' 'target/riscv' 'tools' 'util' All .c should include qemu/osdep.h first. The script performs three related cleanups: * Ensure .c files include qemu/osdep.h first. * Including it in a .h is redundant, since the .c already includes it. Drop such inclusions. * Likewise, including headers qemu/osdep.h includes is redundant. Drop these, too. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20260116125830.926296-4-peter.maydell@linaro.org	3 months ago
Philippe Mathieu-Daudé	e44dc42f3a	bswap: Use 'qemu/bswap.h' instead of 'qemu/host-utils.h' These files only require "qemu/bswap.h", not "qemu/host-utils.h". Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20260109163730.57087-2-philmd@linaro.org>	3 months ago
Philippe Mathieu-Daudé	3115691855	bswap: Include missing 'qemu/bswap.h' header All these files indirectly include the "qemu/bswap.h" header. Make this inclusion explicit to avoid build errors when refactoring unrelated headers. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20260109164742.58041-4-philmd@linaro.org>	3 months ago
Michael Tokarev	a2b429b114	Revert "gdbstub: Try unlinking the unix socket before binding" This reverts commit `fccb744f41`. This commit introduced dependency of linux-user on qemu-sockets.c. The latter includes handling of various socket types, while gdbstub only needs unix sockets. Including different kinds of sockets makes it more problematic to build linux-user statically. The original issue - the need to unlink unix socket before binding - will be addressed in the next change. Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	3 months ago
Richard Henderson	239b9d0488	include/qemu/atomic: Drop aligned_{u}int64_t As we no longer support i386 as a host architecture, this abstraction is no longer required. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	3 months ago
Richard Henderson	71adccb6f7	include/qemu/atomic: Drop qatomic_{read,set}_[iu]64 Replace all uses with the normal qatomic_{read,set}. Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	3 months ago
Richard Henderson	90e2e8ada7	util: Remove stats64 This API is no longer used. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	3 months ago
Richard Henderson	25512d6865	*: Remove __i386__ tests Remove instances of __i386__, except from tests and imported headers. Drop a block containing sanity check and fprintf error message for i386-on-i386 or x86_64-on-x86_64 emulation. If we really want something like this, we would do it via some form of compile-time check. Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	4 months ago
Farhan Ali	0ffc8f3625	util/vfio-helper: Fix endianness in PCI config read/write functions The VFIO pread/pwrite functions use little-endian data format. Currently, the qemu_vfio_pci_read_config() and qemu_vfio_pci_write_config() don't correctly convert from CPU native endian format to little-endian (and vice versa) when using the pread/pwrite functions. Fix this by limiting read/write to 32 bits and handling endian conversion in qemu_vfio_pci_read_config() and qemu_vfio_pci_write_config(). Signed-off-by: Farhan Ali <alifm@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20260105222029.2423-1-alifm@linux.ibm.com [ clg: Fixed typo in subject ] Signed-off-by: Cédric Le Goater <clg@redhat.com>	3 months ago
Ilya Leoshkevich	f098c32db4	target/s390x: Fix infinite loop during replay Replaying even trivial s390x kernels hangs, because: - cpu_post_load() fires the TOD timer immediately. - s390_tod_load() schedules work for firing the TOD timer. - If rr loop sees work and then timer, we get one timer expiration. - If rr loop sees timer and then work, we get two timer expirations. - Record and replay may diverge due to this race. - In this particular case divergence makes replay loop spin: it sees that TOD timer has expired, but cannot invoke its callback, because there is no recorded CHECKPOINT_CLOCK_VIRTUAL. - The order in which rr loop sees work and timer depends on whether and when rr loop wakes up during load_snapshot(). - rr loop may wake up after the main thread kicks the CPU and drops the BQL, which may happen if it calls, e.g., qemu_cond_wait_bql(). Firing TOD timer twice is duplicate work, but it was introduced intentionally in commit `7c12f710ba` ("s390x/tcg: rearm the CKC timer during migration") in order to avoid dependency on migration order. The key culprits here are timers that are armed ready expired. They break the ordering between timers and CPU work, because they are not constrained by instruction execution, thus introducing non-determinism and record-replay divergence. Fix by converting such timer callbacks to CPU work. Also add TOD clock updates to the save path, mirroring the load path, in order to have the same CHECKPOINT_CLOCK_VIRTUAL during recording and replaying. Link: https://lore.kernel.org/qemu-devel/20251128133949.181828-1-thuth@redhat.com/ Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Tested-by: Thomas Huth <thuth@redhat.com> Message-ID: <20251201215514.1751994-1-iii@linux.ibm.com> [thuth: Add SPDX license identifiers to the new stubs files] Signed-off-by: Thomas Huth <thuth@redhat.com>	4 months ago
Markus Armbruster	d84870f2b8	error: Use error_setg_file_open() for simplicity and consistency Replace error_setg_errno(errp, errno, MSG, FNAME); by error_setg_file_open(errp, errno, FNAME); where MSG is "Could not open '%s'" or similar. Also replace equivalent uses of error_setg(). A few messages lose prefixes ("net dump: ", "SEV: ", __func__ ": "). We could put them back with error_prepend(). Not worth the bother. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org> Message-ID: <20251121121438.1249498-11-armbru@redhat.com> [Conflict with commit `26b4a6ffe7` (monitor/hmp: Merge hmp-cmds-target.c within hmp-cmds.c) resolved]	4 months ago
Nguyen Dinh Phi	0c9f429ec8	util: Move qemu_ftruncate64 from block/file-win32.c to oslib-win32.c qemu_ftruncate64() is a general-purpose utility function that may be used outside of the block layer. Move it to util/oslib-win32.c where other Windows-specific utility functions reside. Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-ID: <20251218085446.462827-3-phind.uet@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	4 months ago
Paolo Bonzini	12e50722e4	block: rename block/aio-wait.h to qemu/aio-wait.h AIO_WAIT_WHILE is used even outside the block layer; move the header file out of block/ just like the implementation is in util/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	4 months ago
Paolo Bonzini	ba773aded3	block: rename block/aio.h to qemu/aio.h AioContexts are used as a generic event loop even outside the block layer; move the header file out of block/ just like the implementation is in util/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	4 months ago
Paolo Bonzini	238449947d	block: reduce files included by block/aio.h Avoid including all of qdev everywhere (the hw/core/qdev.h header in fact brings in a lot more headers too), instead declare a couple structs for which only a pointer type is needed. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	4 months ago
Paolo Bonzini	ddab0ef124	block: extract include/qemu/aiocb.h out of include/block/aio.h Create a new header corresponding to functions defined in util/aiocb.c, and include it whenever AIOCBs are used but AioContext is not. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	4 months ago
Marc Morcos	e77508292c	thread-pool: Fix thread race Fix a data race occurred between `worker_thread()` writing and `thread_pool_completion_bh()` reading shared data in `util/thread-pool.c`. Signed-off-by: Marc Morcos <marcmorcos@google.com> Link: https://lore.kernel.org/r/20251213001443.2041258-3-marcmorcos@google.com [Use qatomic_set for writes to ret->ret. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	4 months ago
Paolo Bonzini	7f548b8f23	include: reorganize memory API headers Move RAMBlock functions out of ram_addr.h and cpu-common.h; move memory API headers out of include/exec and into include/system. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	4 months ago
Cédric Le Goater	326e620fc0	Fix const qualifier build errors with recent glibc A recent change in glibc 2.42.9000 [1] changes the return type of strstr() and other string functions to be 'const char ' when the input is a 'const char '. This breaks the build in various files with errors such as : error: initialization discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] 208 \| char pidstr = strstr(filename, "%"); \| ^~~~~~ Fix this by changing the type of the variables that store the result of these functions to 'const char '. [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=cd748a63ab1a7ae846175c532a3daab341c62690 Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20251209174328.698774-1-clg@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	4 months ago
Stefan Hajnoczi	047dabef97	block/io_uring: use aio_add_sqe() AioContext has its own io_uring instance for file descriptor monitoring. The disk I/O io_uring code was developed separately. Originally I thought the characteristics of file descriptor monitoring and disk I/O were too different, requiring separate io_uring instances. Now it has become clear to me that it's feasible to share a single io_uring instance for file descriptor monitoring and disk I/O. We're not using io_uring's IOPOLL feature or anything else that would require a separate instance. Unify block/io_uring.c and util/fdmon-io_uring.c using the new aio_add_sqe() API that allows user-defined io_uring sqe submission. Now block/io_uring.c just needs to submit readv/writev/fsync and most of the io_uring-specific logic is handled by fdmon-io_uring.c. There are two immediate advantages: 1. Fewer system calls. There is no need to monitor the disk I/O io_uring ring fd from the file descriptor monitoring io_uring instance. Disk I/O completions are now picked up directly. Also, sqes are accumulated in the sq ring until the end of the event loop iteration and there are fewer io_uring_enter(2) syscalls. 2. Less code duplication. Note that error_setg() messages are not supposed to end with punctuation, so I removed a '.' for the non-io_uring build error message. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20251104022933.618123-15-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	1eebdab3c3	aio-posix: add aio_add_sqe() API for user-defined io_uring requests Introduce the aio_add_sqe() API for submitting io_uring requests in the current AioContext. This allows other components in QEMU, like the block layer, to take advantage of io_uring features without creating their own io_uring context. This API supports nested event loops just like file descriptor monitoring and BHs do. This comes at a complexity cost: CQE callbacks must be placed on a list so that nested event loops can invoke pending CQE callbacks from parent event loops. If you're wondering why CqeHandler exists instead of just a callback function pointer, this is why. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20251104022933.618123-14-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	87e7a0f423	aio-posix: add fdmon_ops->dispatch() The ppoll and epoll file descriptor monitoring implementations rely on the event loop's generic file descriptor, timer, and BH dispatch code to invoke user callbacks. The io_uring file descriptor monitoring implementation will need io_uring-specific dispatch logic for CQE handlers for custom SQEs. Introduce a new FDMonOps ->dispatch() callback that allows file descriptor monitoring implementations to invoke user callbacks. The next patch will use this new callback. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20251104022933.618123-13-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	a63e41f2a4	aio-posix: unindent fdmon_io_uring_destroy() Reduce the level of indentation to make further code changes easier to read. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20251104022933.618123-12-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	59202c98c0	aio-posix: gracefully handle io_uring_queue_init() failure io_uring may not be available at runtime due to system policies (e.g. the io_uring_disabled sysctl) or creation could fail due to file descriptor resource limits. Handle failure scenarios as follows: If another AioContext already has io_uring, then fail AioContext creation so that the aio_add_sqe() API is available uniformly from all QEMU threads. Otherwise fall back to epoll(7) if io_uring is unavailable. Notes: - Update the comment about selecting the fastest fdmon implementation. At this point it's not about speed anymore, it's about aio_add_sqe() API availability. - Uppercase the error message when converting from error_report() to error_setg_errno() for consistency (but there are instances of lowercase in the codebase). - It's easier to move the #ifdefs from aio-posix.h to aio-posix.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20251104022933.618123-11-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	421dcc8023	aio: add errp argument to aio_context_setup() When aio_context_new() -> aio_context_setup() fails at startup it doesn't really matter whether errors are returned to the caller or the process terminates immediately. However, it is not acceptable to terminate when hotplugging --object iothread at runtime. Refactor aio_context_setup() so that errors can be propagated. The next commit will set errp when fdmon_io_uring_setup() fails. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20251104022933.618123-10-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	3769b9abe9	aio: free AioContext when aio_context_new() fails g_source_destroy() only removes the GSource from the GMainContext it's attached to, if any. It does not free it. Use g_source_unref() instead so that the AioContext (which embeds a GSource) is freed. There is no need to call g_source_destroy() in aio_context_new() because the GSource isn't attached to a GMainContext yet. aio_ctx_finalize() expects everything to be set up already, so introduce the new ctx->initialized boolean and do nothing when called with !initialized. This also requires moving aio_context_setup() down after event_notifier_init() since aio_ctx_finalize() won't release any resources that aio_context_setup() acquired. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20251104022933.618123-9-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	d1f42b600a	aio: remove aio_context_use_g_source() There is no need for aio_context_use_g_source() now that epoll(7) and io_uring(7) file descriptor monitoring works with the glib event loop. AioContext doesn't need to be notified that GSource is being used. On hosts with io_uring support this now enables fdmon-io_uring.c by default, replacing fdmon-poll.c and fdmon-epoll.c. In other words, the event loop will use io_uring! Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20251104022933.618123-8-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	ded29e64c6	aio-posix: integrate fdmon into glib event loop AioContext's glib integration only supports ppoll(2) file descriptor monitoring. epoll(7) and io_uring(7) disable themselves and switch back to ppoll(2) when the glib event loop is used. The main loop thread cannot use epoll(7) or io_uring(7) because it always uses the glib event loop. Future QEMU features may require io_uring(7). One example is uring_cmd support in FUSE exports. Each feature could create its own io_uring(7) context and integrate it into the event loop, but this is inefficient due to extra syscalls. It would be more efficient to reuse the AioContext's existing fdmon-io_uring.c io_uring(7) context because fdmon-io_uring.c will already be active on systems where Linux io_uring is available. In order to keep fdmon-io_uring.c's AioContext operational even when the glib event loop is used, extend FDMonOps with an API similar to GSourceFuncs so that file descriptor monitoring can integrate into the glib event loop. A quick summary of the GSourceFuncs API: - prepare() is called each event loop iteration before waiting for file descriptors and timers. - check() is called to determine whether events are ready to be dispatched after waiting. - dispatch() is called to process events. More details here: https://docs.gtk.org/glib/struct.SourceFuncs.html Move the ppoll(2)-specific code from aio-posix.c into fdmon-poll.c and also implement epoll(7)- and io_uring(7)-specific file descriptor monitoring code for glib event loops. Note that it's still faster to use aio_poll() rather than the glib event loop since glib waits for file descriptor activity with ppoll(2) and does not support adaptive polling. But at least epoll(7) and io_uring(7) now work in glib event loops. Splitting this into multiple commits without temporarily breaking AioContext proved difficult so this commit makes all the changes. The next commit will remove the aio_context_use_g_source() API because it is no longer needed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20251104022933.618123-7-stefanha@redhat.com> [kwolf: Build fixes; fix AioContext.list_lock use after destroy] Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	511c62a2c6	aio-posix: keep polling enabled with fdmon-io_uring.c Commit `816a430c51` ("util/aio: Defer disabling poll mode as long as possible") kept polling enabled when the event loop timeout is 0. Since there is no timeout the event loop will continue immediately and the overhead of disabling and re-enabling polling can be avoided. fdmon-io_uring.c is unable to take advantage of this optimization because its ->need_wait() function returns true whenever there are new io_uring SQEs to submit: if (timeout \|\| ctx->fdmon_ops->need_wait(ctx)) { ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Polling will be disabled even when timeout == 0. Extend the optimization to handle the case when need_wait() returns true and timeout == 0. Cc: Chao Gao <chao.gao@intel.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20251104022933.618123-5-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	5f8741fca5	aio-posix: fix spurious return from ->wait() due to signals io_uring_enter(2) only returns -EINTR in some cases when interrupted by a signal. Therefore the while loop in fdmon_io_uring_wait() is incomplete and can lead to a spurious early return. Handle the case when a signal interrupts io_uring_enter(2) but the syscall returns the number of SQEs submitted (that takes priority over -EINTR). This patch probably makes little difference for QEMU, but the test suite relies on the exact pattern of aio_poll() return values, so it's best to hide this io_uring syscall interface quirk. Here is the strace of test-aio receiving 3 SIGCONT signals after this fix has been applied. Notice how the io_uring_enter(2) return value is 1 the first time because an SQE was submitted, but -EINTR the other times: eventfd2(0, EFD_CLOEXEC\|EFD_NONBLOCK) = 9 io_uring_enter(7, 1, 0, 0, NULL, 8) = 1 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe38a46240) = 0 io_uring_enter(7, 1, 1, IORING_ENTER_GETEVENTS, NULL, 8) = 1 --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=596096, si_uid=1000} --- io_uring_enter(7, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8) = -1 EINTR (Interrupted system call) --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=596096, si_uid=1000} --- io_uring_enter(7, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8 <unfinished ...> <... io_uring_enter resumed>) = -1 EINTR (Interrupted system call) --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=596096, si_uid=1000} --- io_uring_enter(7, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8 <unfinished ...> <... io_uring_enter resumed>) = 0 Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20251104022933.618123-4-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	c31a445749	aio-posix: fix fdmon-io_uring.c timeout stack variable lifetime io_uring_prep_timeout() stashes a pointer to the timespec struct rather than copying its fields. That means the struct must live until after the SQE has been submitted by io_uring_enter(2). add_timeout_sqe() violates this constraint because the SQE is not submitted within the function. Inline add_timeout_sqe() into fdmon_io_uring_wait() so that the struct lives at least as long as io_uring_enter(2). This fixes random hangs (bogus timeout values) when the kernel loads undefined timespec struct values from userspace after the original struct on the stack has been destroyed. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20251104022933.618123-3-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Stefan Hajnoczi	dbf70f0a03	aio-posix: fix race between io_uring CQE and AioHandler deletion When an AioHandler is enqueued on ctx->submit_list for removal, the fill_sq_ring() function will submit an io_uring POLL_REMOVE operation to cancel the in-flight POLL_ADD operation. There is a race when another thread enqueues an AioHandler for deletion on ctx->submit_list when the POLL_ADD CQE has already appeared. In that case POLL_REMOVE is unnecessary. The code already handled this, but forgot that the AioHandler itself is still on ctx->submit_list when the POLL_ADD CQE is being processed. It's unsafe to delete the AioHandler at that point in time (use-after-free). Solve this problem by keeping the AioHandler alive but setting a flag so that it will be deleted by fill_sq_ring() when it runs. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20251104022933.618123-2-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	5 months ago
Vladimir Sementsov-Ogievskiy	20aa05edc2	util/hexdump: fix QEMU_HEXDUMP_LINE_WIDTH logic QEMU_HEXDUMP_LINE_WIDTH calculation doesn't correspond to qemu_hexdump_line(). This leads to last line of the dump (when length is not multiply of 16) has badly aligned ASCII part. Let's calculate length the same way. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20251031190246.257153-2-vsementsov@yandex-team.ru> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	5 months ago
Alex Bennée	91634cc331	timers: properly prefix init_clocks() Otherwise we run the risk of name clashing, for example with stm32l4x5_usart-test.c should we shuffle the includes. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20251030173302.1379174-1-alex.bennee@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	5 months ago
Peter Xu	c89d1c879a	bql: Fix bql_locked status with condvar APIs QEMU has a per-thread "bql_locked" variable stored in TLS section, showing whether the current thread is holding the BQL lock. It's a pretty handy variable. Function-wise, QEMU have codes trying to conditionally take bql, relying on the var reflecting the locking status (e.g. BQL_LOCK_GUARD), or in a GDB debugging session, we could also look at the variable (in reality, co_tls_bql_locked), to see which thread is currently holding the bql. When using that as a debugging facility, sometimes we can observe multiple threads holding bql at the same time. It's because QEMU's condvar APIs bypassed the bql_*() API, hence they do not update bql_locked even if they have released the mutex while waiting. It can cause confusion if one does "thread apply all p co_tls_bql_locked" and see multiple threads reporting true. Fix this by moving the bql status updates into the mutex debug hooks. Now the variable should always reflect the reality. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20250904223158.1276992-1-peterx@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	7 months ago
Akihiko Odaki	55d98e3ede	rcu: Unify force quiescent state Borrow the concept of force quiescent state from Linux to ensure readers remain fast during normal operation and to avoid stalls. Background ========== The previous implementation had four steps to begin reclamation. 1. call_rcu_thread() would wait for the first callback. 2. call_rcu_thread() would periodically poll until a decent number of callbacks piled up or it timed out. 3. synchronize_rcu() would statr a grace period (GP). 4. wait_for_readers() would wait for the GP to end. It would also trigger the force_rcu notifier to break busy loops in a read-side critical section if drain_call_rcu() had been called. Problem ======= The separation of waiting logic across these steps led to suboptimal behavior: The GP was delayed until call_rcu_thread() stops polling. force_rcu was not consistently triggered when call_rcu_thread() detected a high number of pending callbacks or a timeout. This inconsistency sometimes led to stalls, as reported in a virtio-gpu issue where memory unmapping was blocked[1]. wait_for_readers() imposed unnecessary overhead in non-urgent cases by unconditionally executing qatomic_set(&index->waiting, true) and qemu_event_reset(&rcu_gp_event), which are necessary only for expedited synchronization. Solution ======== Move the polling in call_rcu_thread() to wait_for_readers() to prevent the delay of the GP. Additionally, reorganize wait_for_readers() to distinguish between two states: Normal State: it relies exclusively on periodic polling to detect the end of the GP and maintains the read-side fast path. Force Quiescent State: Whenever expediting synchronization, it always triggers force_rcu and executes both qatomic_set(&index->waiting, true) and qemu_event_reset(&rcu_gp_event). This avoids stalls while confining the read-side overhead to this state. This unified approach, inspired by the Linux RCU, ensures consistent and efficient RCU grace period handling and confirms resolution of the virtio-gpu issue. [1] https://lore.kernel.org/qemu-devel/20251014111234.3190346-9-alex.bennee@linaro.org/ Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20251016-force-v1-1-919a82112498@rsg.ci.i.u-tokyo.ac.jp Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	6 months ago
Philippe Mathieu-Daudé	2ff8c9a298	buildsys: Remove support for 32-bit PPC hosts Stop detecting 32-bit PPC host as supported. See previous commit for rationale. Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> [rth: Retain _ARCH_PPC64 check in udiv_qrnnd] Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20251014173900.87497-4-philmd@linaro.org>	6 months ago
Paolo Bonzini	67913e95bf	timer: constify some functions Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	6 months ago
Paolo Bonzini	5142397c79	async: access bottom half flags with qatomic_read Running test-aio-multithread under TSAN reveals data races on bh->flags. Because bottom halves may be scheduled or canceled asynchronously, without taking a lock, adjust aio_compute_bh_timeout() and aio_ctx_check() to use a relaxed read to access the flags. Use an acquire load to ensure that anything that was written prior to qemu_bh_schedule() is visible. Closes: https://gitlab.com/qemu-project/qemu/-/issues/2749 Closes: https://gitlab.com/qemu-project/qemu/-/issues/851 Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	6 months ago

1 2 3 4 5 ...

2070 Commits (46255cc2be9cb4a013077ac73b4cca41bf883425)