riscv/qemu - qemu - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Marc-André Lureau	25bc7d16fa	util/coroutine: fix -Werror=maybe-uninitialized false-positive ../util/qemu-coroutine.c:150:8: error: ‘batch’ may be used uninitialized [-Werror=maybe-uninitialized] Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2 years ago
Stefan Hajnoczi	9352f80cd9	coroutine: reserve 5,000 mappings Daniel P. Berrangé <berrange@redhat.com> pointed out that the coroutine pool size heuristic is very conservative. Instead of halving max_map_count, he suggested reserving 5,000 mappings for non-coroutine users based on observations of guests he has access to. Fixes: `86a637e481` ("coroutine: cap per-thread local pool size") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-id: 20240320181232.1464819-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2 years ago
Stefan Hajnoczi	86a637e481	coroutine: cap per-thread local pool size The coroutine pool implementation can hit the Linux vm.max_map_count limit, causing QEMU to abort with "failed to allocate memory for stack" or "failed to set up stack guard page" during coroutine creation. This happens because per-thread pools can grow to tens of thousands of coroutines. Each coroutine causes 2 virtual memory areas to be created. Eventually vm.max_map_count is reached and memory-related syscalls fail. The per-thread pool sizes are non-uniform and depend on past coroutine usage in each thread, so it's possible for one thread to have a large pool while another thread's pool is empty. Switch to a new coroutine pool implementation with a global pool that grows to a maximum number of coroutines and per-thread local pools that are capped at hardcoded small number of coroutines. This approach does not leave large numbers of coroutines pooled in a thread that may not use them again. In order to perform well it amortizes the cost of global pool accesses by working in batches of coroutines instead of individual coroutines. The global pool is a list. Threads donate batches of coroutines to when they have too many and take batches from when they have too few: .-----------------------------------. \| Batch 1 \| Batch 2 \| Batch 3 \| ... \| global_pool `-----------------------------------' Each thread has up to 2 batches of coroutines: .-------------------. \| Batch 1 \| Batch 2 \| per-thread local_pool (maximum 2 batches) `-------------------' The goal of this change is to reduce the excessive number of pooled coroutines that cause QEMU to abort when vm.max_map_count is reached without losing the performance of an adequately sized coroutine pool. Here are virtio-blk disk I/O benchmark results: RW BLKSIZE IODEPTH OLD NEW CHANGE randread 4k 1 113725 117451 +3.3% randread 4k 8 192968 198510 +2.9% randread 4k 16 207138 209429 +1.1% randread 4k 32 212399 215145 +1.3% randread 4k 64 218319 221277 +1.4% randread 128k 1 17587 17535 -0.3% randread 128k 8 17614 17616 +0.0% randread 128k 16 17608 17609 +0.0% randread 128k 32 17552 17553 +0.0% randread 128k 64 17484 17484 +0.0% See files/{fio.sh,test.xml.j2} for the benchmark configuration: https://gitlab.com/stefanha/virt-playbooks/-/tree/coroutine-pool-fix-sizing Buglink: https://issues.redhat.com/browse/RHEL-28947 Reported-by: Sanjay Rao <srao@redhat.com> Reported-by: Boaz Ben Shabat <bbenshab@redhat.com> Reported-by: Joe Mario <jmario@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20240318183429.1039340-1-stefanha@redhat.com>	2 years ago
Paolo Bonzini	230f6e06b8	meson: do not use set10 Make all items of config-host.h consistent. To keep the --disable-coroutine-pool code visible to the compiler, mutuate the IS_ENABLED() macro from Linux. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	3 years ago
Paolo Bonzini	512c90c90e	qemu-coroutine: remove qatomic_mb_read() Replace with an explicit barrier and a comment. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	3 years ago
Markus Armbruster	2379247810	coroutine: Clean up superfluous inclusion of qemu/coroutine.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20221221131435.3851212-2-armbru@redhat.com>	3 years ago
Paolo Bonzini	a248b856a8	coroutine: remove incorrect coroutine_fn annotations qemu_coroutine_get_aio_context inspects a coroutine, but it does not have to be called from the coroutine itself (or from any coroutine). Reviewed-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220922084924.201610-6-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	4 years ago
Kevin Wolf	9ec7a59b5a	coroutine: Revert to constant batch size Commit `4c41c69e` changed the way the coroutine pool is sized because for virtio-blk devices with a large queue size and heavy I/O, it was just too small and caused coroutines to be deleted and reallocated soon afterwards. The change made the size dynamic based on the number of queues and the queue size of virtio-blk devices. There are two important numbers here: Slightly simplified, when a coroutine terminates, it is generally stored in the global release pool up to a certain pool size, and if the pool is full, it is freed. Conversely, when allocating a new coroutine, the coroutines in the release pool are reused if the pool already has reached a certain minimum size (the batch size), otherwise we allocate new coroutines. The problem after commit `4c41c69e` is that it not only increases the maximum pool size (which is the intended effect), but also the batch size for reusing coroutines (which is a bug). It means that in cases with many devices and/or a large queue size (which defaults to the number of vcpus for virtio-blk-pci), many thousand coroutines could be sitting in the release pool without being reused. This is not only a waste of memory and allocations, but it actually makes the QEMU process likely to hit the vm.max_map_count limit on Linux because each coroutine requires two mappings (its stack and the guard page for the stack), causing it to abort() in qemu_alloc_stack() because when the limit is hit, mprotect() starts to fail with ENOMEM. In order to fix the problem, change the batch size back to 64 to avoid uselessly accumulating coroutines in the release pool, but keep the dynamic maximum pool size so that coroutines aren't freed too early in heavy I/O scenarios. Note that this fix doesn't strictly make it impossible to hit the limit, but this would only happen if most of the coroutines are actually in use at the same time, not just sitting in a pool. This is the same behaviour as we already had before commit `4c41c69e`. Fully preventing this would require allowing qemu_coroutine_create() to return an error, but it doesn't seem to be a scenario that people hit in practice. Cc: qemu-stable@nongnu.org Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2079938 Fixes: `4c41c69e05` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20220510151020.105528-3-kwolf@redhat.com> Tested-by: Hiroki Narukawa <hnarukaw@yahoo-corp.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	4 years ago
Kevin Wolf	98e3ab3505	coroutine: Rename qemu_coroutine_inc/dec_pool_size() It's true that these functions currently affect the batch size in which coroutines are reused (i.e. moved from the global release pool to the allocation pool of a specific thread), but this is a bug and will be fixed in a separate patch. In fact, the comment in the header file already just promises that it influences the pool size, so reflect this in the name of the functions. As a nice side effect, the shorter function name makes some line wrapping unnecessary. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20220510151020.105528-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	4 years ago
Stefan Hajnoczi	ac387a08a9	coroutine: use QEMU_DEFINE_STATIC_CO_TLS() Thread-Local Storage variables cannot be used directly from coroutine code because the compiler may optimize TLS variable accesses across qemu_coroutine_yield() calls. When the coroutine is re-entered from another thread the TLS variables from the old thread must no longer be used. Use QEMU_DEFINE_STATIC_CO_TLS() for the current and leader variables. The alloc_pool QSLIST needs a typedef so the return value of get_ptr_alloc_pool() can be stored in a local variable. One example of why this code is necessary: a coroutine that yields before calling qemu_coroutine_create() to create another coroutine is affected by the TLS issue. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220307153853.602859-3-stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	4 years ago
Hiroki Narukawa	4c41c69e05	util: adjust coroutine pool size to virtio block queue Coroutine pool size was 64 from long ago, and the basis was organized in the commit message in `4d68e86b`. At that time, virtio-blk queue-size and num-queue were not configuable, and equivalent values were 128 and 1. Coroutine pool size 64 was fine then. Later queue-size and num-queue got configuable, and default values were increased. Coroutine pool with size 64 exhausts frequently with random disk IO in new size, and slows down. This commit adjusts coroutine pool size adaptively with new values. This commit adds 64 by default, but now coroutine is not only for block devices, and is not too much burdon comparing with new default. pool size of 128 * vCPUs. Signed-off-by: Hiroki Narukawa <hnarukaw@yahoo-corp.jp> Message-id: 20220214115302.13294-2-hnarukaw@yahoo-corp.jp Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	4 years ago
Stefan Hajnoczi	d73415a315	qemu/atomic.h: rename atomic_ to qatomic_ clang's C11 atomic_fetch_() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int ' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic$64$\?_[a-z0-9_]\+' include/qemu/atomic.h \| \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>	6 years ago
Markus Armbruster	a8d2532645	Include qemu-common.h exactly where needed No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]	7 years ago
Kevin Wolf	aa1361d54a	block: Add missing locking in bdrv_co_drain_bh_cb() bdrv_do_drained_begin/end() assume that they are called with the AioContext lock of bs held. If we call drain functions from a coroutine with the AioContext lock held, we yield and schedule a BH to move out of coroutine context. This means that the lock for the home context of the coroutine is released and must be re-acquired in the bottom half. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	8 years ago
Stefan Hajnoczi	c40a254570	coroutine: avoid co_queue_wakeup recursion qemu_aio_coroutine_enter() is (indirectly) called recursively when processing co_queue_wakeup. This can lead to stack exhaustion. This patch rewrites co_queue_wakeup in an iterative fashion (instead of recursive) with bounded memory usage to prevent stack exhaustion. qemu_co_queue_run_restart() is inlined into qemu_aio_coroutine_enter() and the qemu_coroutine_enter() call is turned into a loop to avoid recursion. There is one change that is worth mentioning: Previously, when coroutine A queued coroutine B, qemu_co_queue_run_restart() entered coroutine B from coroutine A. If A was terminating then it would still stay alive until B yielded. After this patch B is entered by A's parent so that a A can be deleted immediately if it is terminating. It is safe to make this change since B could never interact with A if it was terminating anyway. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20180322152834.12656-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	8 years ago
Jeff Cody	6133b39f3c	coroutine: abort if we try to schedule or enter a pending coroutine The previous patch fixed a race condition, in which there were coroutines being executing doubly, or after coroutine deletion. We can detect common scenarios when this happens, and print an error message and abort before we corrupt memory / data, or segfault. This patch will abort if an attempt to enter a coroutine is made while it is currently pending execution, either in a specific AioContext bh, or pending execution via a timer. It will also abort if a coroutine is scheduled, before a prior scheduled run has occurred. We cannot rely on the existing co->caller check for recursive re-entry to catch this, as the coroutine may run and exit with COROUTINE_TERMINATE before the scheduled coroutine executes. (This is the scenario that was occurring and fixed in the previous patch). This patch also re-orders the Coroutine struct elements in an attempt to optimize caching. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	9 years ago
Roman Pen	528f449f59	coroutine-lock: do not touch coroutine after another one has been entered Submission of requests on linux aio is a bit tricky and can lead to requests completions on submission path: `44713c9e85` ("linux-aio: Handle io_submit() failure gracefully") `0ed93d84ed` ("linux-aio: process completions from ioq_submit()") That means that any coroutine which has been yielded in order to wait for completion can be resumed from submission path and be eventually terminated (freed). The following use-after-free crash was observed when IO throttling was enabled: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f5813dff700 (LWP 56417)] virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252 (gdb) bt #0 virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252 ^^^^^^^^^^^^^^ remember the address #1 virtqueue_fill (vq=0x5598b20d21b0, elem=0x7f5804009a30, len=1, idx=0) at virtio.c:282 #2 virtqueue_push (vq=0x5598b20d21b0, elem=elem@entry=0x7f5804009a30, len=<optimized out>) at virtio.c:308 #3 virtio_blk_req_complete (req=req@entry=0x7f5804009a30, status=status@entry=0 '\000') at virtio-blk.c:61 #4 virtio_blk_rw_complete (opaque=<optimized out>, ret=0) at virtio-blk.c:126 #5 blk_aio_complete (acb=0x7f58040068d0) at block-backend.c:923 #6 coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:78 (gdb) p * elem $8 = {index = 77, out_num = 2, in_num = 1, in_addr = 0x7f5804009ad8, out_addr = 0x7f5804009ae0, in_sg = 0x0, out_sg = 0x7f5804009a50} ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 'in_sg' and 'out_sg' are invalid. e.g. it is impossible that 'in_sg' is zero, instead its value must be equal to: (gdb) p/x 0x7f5804009ad8 + sizeof(elem->in_addr[0]) + 2 * sizeof(elem->out_addr[0]) $26 = 0x7f5804009af0 Seems 'elem' was corrupted. Meanwhile another thread raised an abort: Thread 12 (Thread 0x7f57f2ffd700 (LWP 56426)): #0 raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 qemu_coroutine_enter (co=0x7f5804009af0) at qemu-coroutine.c:113 #3 qemu_co_queue_run_restart (co=0x7f5804009a30) at qemu-coroutine-lock.c:60 #4 qemu_coroutine_enter (co=0x7f5804009a30) at qemu-coroutine.c:119 ^^^^^^^^^^^^^^^^^^ WTF?? this is equal to elem from crashed thread #5 qemu_co_queue_run_restart (co=0x7f57e7f16ae0) at qemu-coroutine-lock.c:60 #6 qemu_coroutine_enter (co=0x7f57e7f16ae0) at qemu-coroutine.c:119 #7 qemu_co_queue_run_restart (co=0x7f5807e112a0) at qemu-coroutine-lock.c:60 #8 qemu_coroutine_enter (co=0x7f5807e112a0) at qemu-coroutine.c:119 #9 qemu_co_queue_run_restart (co=0x7f5807f17820) at qemu-coroutine-lock.c:60 #10 qemu_coroutine_enter (co=0x7f5807f17820) at qemu-coroutine.c:119 #11 qemu_co_queue_run_restart (co=0x7f57e7f18e10) at qemu-coroutine-lock.c:60 #12 qemu_coroutine_enter (co=0x7f57e7f18e10) at qemu-coroutine.c:119 #13 qemu_co_enter_next (queue=queue@entry=0x5598b1e742d0) at qemu-coroutine-lock.c:106 #14 timer_cb (blk=0x5598b1e74280, is_write=<optimized out>) at throttle-groups.c:419 Crash can be explained by access of 'co' object from the loop inside qemu_co_queue_run_restart(): while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) { QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next); ^^^^^^^^^^^^^^^^^^^^ on each iteration 'co' is accessed, but 'co' can be already freed qemu_coroutine_enter(next); } When 'next' coroutine is resumed (entered) it can in its turn resume 'co', and eventually free it. That's why we see 'co' (which was freed) has the same address as 'elem' from the first backtrace. The fix is obvious: use temporary queue and do not touch coroutine after first qemu_coroutine_enter() is invoked. The issue is quite rare and happens every ~12 hours on very high IO and CPU load (building linux kernel with -j512 inside guest) when IO throttling is enabled. With the fix applied guest is running ~35 hours and is still alive so far. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170601160847.23720-1-roman.penyaev@profitbricks.com Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Fam Zheng <famz@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	9 years ago
Fam Zheng	ba9e75ceef	coroutine: Extract qemu_aio_coroutine_enter It's a variant of qemu_coroutine_enter with an explicit AioContext parameter. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	9 years ago
Paolo Bonzini	480cff6322	coroutine-lock: add limited spinning to CoMutex Running a very small critical section on pthread_mutex_t and CoMutex shows that pthread_mutex_t is much faster because it doesn't actually go to sleep. What happens is that the critical section is shorter than the latency of entering the kernel and thus FUTEX_WAIT always fails. With CoMutex there is no such latency but you still want to avoid wait and wakeup. So introduce it artificially. This only works with one waiters; because CoMutex is fair, it will always have more waits and wakeups than a pthread_mutex_t. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213181244.16297-3-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	9 years ago
Paolo Bonzini	0c330a734b	aio: introduce aio_co_schedule and aio_co_wake aio_co_wake provides the infrastructure to start a coroutine on a "home" AioContext. It will be used by CoMutex and CoQueue, so that coroutines don't jump from one context to another when they go to sleep on a mutex or waitqueue. However, it can also be used as a more efficient alternative to one-shot bottom halves, and saves the effort of tracking which AioContext a coroutine is running on. aio_co_schedule is the part of aio_co_wake that starts a coroutine on a remove AioContext, but it is also useful to implement e.g. bdrv_set_aio_context callbacks. The implementation of aio_co_schedule is based on a lock-free multiple-producer, single-consumer queue. The multiple producers use cmpxchg to add to a LIFO stack. The consumer (a per-AioContext bottom half) grabs all items added so far, inverts the list to make it FIFO, and goes through it one item at a time until it's empty. The data structure was inspired by OSv, which uses it in the very code we'll "port" to QEMU for the thread-safe CoMutex. Most of the new code is really tests. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-3-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	9 years ago
Kevin Wolf	536fca7f7e	coroutine: Introduce qemu_coroutine_enter_if_inactive() In the context of asynchronous work, if we have a worker coroutine that didn't yield, the parent coroutine cannot be reentered because it hasn't yielded yet. In this case we don't even have to reenter the parent because it will see that the work is already done and won't even yield. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	10 years ago
Stefan Hajnoczi	f643e469f3	coroutine: add qemu_coroutine_entered() function See the doc comments for a description of this new coroutine API. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1474989516-18255-2-git-send-email-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	10 years ago
Kevin Wolf	1b7f01d966	coroutine: Assert that no locks are held on termination A coroutine that takes a lock must also release it again. If the coroutine terminates without having released all its locks, it's buggy and we'll probably run into a deadlock sooner or later. Make sure that we don't get such cases. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	10 years ago
Paolo Bonzini	0b8b8753e4	coroutine: move entry argument to qemu_coroutine_create In practice the entry argument is always known at creation time, and it is confusing that sometimes qemu_coroutine_enter is used with a non-NULL argument to re-enter a coroutine (this happens in block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value at creation time, for consistency with e.g. aio_bh_new. Mostly done with the following semantic patch: @ entry1 @ expression entry, arg, co; @@ - co = qemu_coroutine_create(entry); + co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry2 @ expression entry, arg; identifier co; @@ - Coroutine co = qemu_coroutine_create(entry); + Coroutine co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry3 @ expression entry, arg; @@ - qemu_coroutine_enter(qemu_coroutine_create(entry), arg); + qemu_coroutine_enter(qemu_coroutine_create(entry, arg)); @ reentry @ expression co; @@ - qemu_coroutine_enter(co, NULL); + qemu_coroutine_enter(co); except for the aforementioned few places where the semantic patch stumbled (as expected) and for test_co_queue, which would otherwise produce an uninitialized variable warning. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	10 years ago
Paolo Bonzini	7d9c858137	coroutine: use QSIMPLEQ instead of QTAILQ CoQueue do not need to remove any element but the head of the list; processing is always strictly FIFO. Therefore, the simpler singly-linked QSIMPLEQ can be used instead. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	10 years ago
Peter Maydell	aafd758410	util: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1454089805-5470-6-git-send-email-peter.maydell@linaro.org	10 years ago
Daniel P. Berrange	10817bf09d	coroutine: move into libqemuutil.a library The coroutine files are currently referenced by the block-obj-y variable. The coroutine functionality though is already used by more than just the block code. eg migration code uses coroutine yield. In the future the I/O channel code will also use the coroutine yield functionality. Since the coroutine code is nicely self-contained it can be easily built as part of the libqemuutil.a library, making it widely available. The headers are also moved into include/qemu, instead of the include/block directory, since they are now part of the util codebase, and the impl was never in the block/ directory either. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	11 years ago
Kevin Wolf	cd12bb567c	coroutine: Clean up qemu_coroutine_enter() qemu_coroutine_enter() is now the only user of coroutine_swap(). Both functions are short, so inline it. Also, using COROUTINE_YIELD is now even more confusing because this code is never called during qemu_coroutine_yield() any more. In fact, this value is never read back, so we can just introduce a new COROUTINE_ENTER which documents the purpose of the task switch better. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	11 years ago
Kevin Wolf	315a1309de	coroutine: Fix use after free with qemu_coroutine_yield() Instead of using the same function for entering and exiting coroutines, and hoping that it doesn't add any functionality that hurts with the parameters used for exiting, we can just directly call into the real task switch in qemu_coroutine_switch(). This fixes a use-after-free scenario where reentering a coroutine that has yielded still accesses the old parent coroutine (which may have meanwhile terminated) in the part of coroutine_swap() that follows qemu_coroutine_switch(). Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	11 years ago
Peter Lieven	51a2219bdc	coroutine: try harder not to delete coroutines Placing coroutines on the global pool should be preferrable, because it can help all threads. But if the global pool is full, we can still try to save some allocations by stashing completed coroutines on the local pool. This is quite cheap too, because it does not require atomic operations, and provides a gain of 15% in the best case. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417518350-6167-8-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	12 years ago
Paolo Bonzini	66552b894b	coroutine: drop qemu_coroutine_adjust_pool_size This is not needed anymore. The new TLS-based algorithm is adaptive. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417518350-6167-7-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	12 years ago
Paolo Bonzini	4d68e86bb1	coroutine: rewrite pool to avoid mutex This patch removes the mutex by using fancy lock-free manipulation of the pool. Lock-free stacks and queues are not hard, but they can suffer from the ABA problem so they are better avoided unless you have some deferred reclamation scheme like RCU. Otherwise you have to stick with adding to a list, and emptying it completely. This is what this patch does, by coupling a lock-free global list of available coroutines with per-CPU lists that are actually used on coroutine creation. Whenever the destruction pool is big enough, the next thread that runs out of coroutines will steal the whole destruction pool. This is positive in two ways: 1) the allocation does not have to do any atomic operation in the fast path, it's entirely using thread-local storage. Once every POOL_BATCH_SIZE allocations it will do a single atomic_xchg. Release does an atomic_cmpxchg loop, that hopefully doesn't cause any starvation, and an atomic_inc. A later patch will also remove atomic operations from the release path, and try to avoid the atomic_xchg altogether---succeeding in doing so if all devices either use ioeventfd or are not submitting requests actively. 2) in theory this should be completely adaptive. The number of coroutines around should be a little more than POOL_BATCH_SIZE * number of allocating threads; so this also empties qemu_coroutine_adjust_pool_size. (The previous pool size was POOL_BATCH_SIZE * number of block backends, so it was a bit more generous. But if you actually have many high-iodepth disks, it's better to put them in different iothreads, which will also use separate thread pools and aio=native file descriptors). This speeds up perf/cost (in tests/test-coroutine) by a factor of ~1.33. No matter if we end with some kind of coroutine bypass scheme or not, it cannot hurt to optimize hot code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417518350-6167-6-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	12 years ago
Stefan Hajnoczi	ac2662a913	coroutine: make pool size dynamic Allow coroutine users to adjust the pool size. For example, if the guest has multiple emulated disk drives we should keep around more coroutines. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	12 years ago
Stefan Hajnoczi	70c60c089f	coroutine: add ./configure --disable-coroutine-pool The 'gthread' coroutine backend was written before the freelist (aka pool) existed in qemu-coroutine.c. This means that every thread is expected to exit when its coroutine terminates. It is not possible to reuse threads from a pool. This patch automatically disables the pool when 'gthread' is used. This allows the 'gthread' backend to work again (for example, tests/test-coroutine completes successfully instead of hanging). I considered implementing thread reuse but I don't want quirks like CPU affinity differences due to coroutine threads being recycled. The 'gthread' backend is a reference backend and it's therefore okay to skip the pool optimization. Note this patch also makes it easy to toggle the pool for benchmarking purposes: ./configure --with-coroutine-backend=ucontext \ --disable-coroutine-pool Reported-by: Gabriel Kerneis <gabriel@kerneis.info> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Gabriel Kerneis <gabriel@kerneis.info> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	13 years ago
Stefan Hajnoczi	02ffb50448	coroutine: stop using AioContext in CoQueue qemu_co_queue_next(&queue) arranges that the next queued coroutine is run at a later point in time. This deferred restart is useful because the caller may not want to transfer control yet. This behavior was implemented using QEMUBH in the past, which meant that CoQueue (and hence CoMutex and CoRwlock) had a dependency on the AioContext event loop. This hidden dependency causes trouble when we move to a world with multiple event loops - now qemu_co_queue_next() needs to know which event loop to schedule the QEMUBH in. After pondering how to stash AioContext I realized the best solution is to not use AioContext at all. This patch implements the deferred restart behavior purely in terms of coroutines and no longer uses QEMUBH. Here is how it works: Each Coroutine has a wakeup queue that starts out empty. When qemu_co_queue_next() is called, the next coroutine is added to our wakeup queue. The wakeup queue is processed when we yield or terminate. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	13 years ago
Stefan Hajnoczi	b84c458623	coroutine: protect global pool with a mutex The coroutine freelist is a global pool of unused coroutines. It avoids the setup/teardown overhead associated with the coroutine lifecycle. Since the pool is global, we need to synchronize access so that coroutines can be used outside the BQL. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	13 years ago
Paolo Bonzini	402397843e	coroutine: move pooling to common code The coroutine pool code is duplicated between the ucontext and sigaltstack backends, and absent from the win32 backend. But the code can be shared easily by moving it to qemu-coroutine.c. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	13 years ago
Paolo Bonzini	737e150e89	block: move include files to include/block/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	14 years ago
Kevin Wolf	00dccaf1f8	coroutine: introduce coroutines Asynchronous code is becoming very complex. At the same time synchronous code is growing because it is convenient to write. Sometimes duplicate code paths are even added, one synchronous and the other asynchronous. This patch introduces coroutines which allow code that looks synchronous but is asynchronous under the covers. A coroutine has its own stack and is therefore able to preserve state across blocking operations, which traditionally require callback functions and manual marshalling of parameters. Creating and starting a coroutine is easy: coroutine = qemu_coroutine_create(my_coroutine); qemu_coroutine_enter(coroutine, my_data); The coroutine then executes until it returns or yields: void coroutine_fn my_coroutine(void opaque) { MyData my_data = opaque; /* do some work / qemu_coroutine_yield(); / do some more work */ } Yielding switches control back to the caller of qemu_coroutine_enter(). This is typically used to switch back to the main thread's event loop after issuing an asynchronous I/O request. The request callback will then invoke qemu_coroutine_enter() once more to switch back to the coroutine. Note that if coroutines are used only from threads which hold the global mutex they will never execute concurrently. This makes programming with coroutines easier than with threads. Race conditions cannot occur since only one coroutine may be active at any time. Other coroutines can only run across yield. This coroutines implementation is based on the gtk-vnc implementation written by Anthony Liguori <anthony@codemonkey.ws> but it has been significantly rewritten by Kevin Wolf <kwolf@redhat.com> to use setjmp()/longjmp() instead of the more expensive swapcontext() and by Paolo Bonzini <pbonzini@redhat.com> for Windows Fibers support. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	15 years ago

27 Commits (86f847a39aef93bcfbea65a702ba76762ae54d61)