In eth_strip_vlan_ex() we take a pointer to the eth_header h_proto
field into a local uint16_t* variable, and then later in the function
we dereference that pointer. This isn't safe, because the eth_header
struct may not be aligned, and if we mark the struct as QEMU_PACKED
then gcc will complain about taking the address of a field in a
packed struct.
Instead, make the local variable be a void* and use the appropriate
functions for accessing 16 bits of possibly unaligned data through
it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Message-ID: <20260212140917.1443253-3-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Even if it is deprecated by the VirtIO standard it does not affect the
layout of the queue, or introduces new operations. So Shadow Virtqueue
can handle it just fine.
Tested with OVS DPDK and VDUSE.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260206144717.730799-1-eperezma@redhat.com>
All three events are shared between precopy and postcopy, rather than
precopy specific.
For example, both precopy and postcopy will go through a SETUP process.
Meanwhile, both FAILED and DONE notifiers will be notified for either
precopy or postcopy on completions / failures.
Rename them to make them match what they do, and shorter.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260126213614.3815900-6-peterx@redhat.com
[fixed-up entry in scsi-disk.c that got merged first]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Currently, filter-redirector does not implement the status_changed
callback, which means the 'status' property cannot be used to
dynamically enable/disable the filter at runtime. When status is
set to 'off' via QMP/HMP, the filter still receives data from the
indev chardev because the chardev handlers remain registered.
This patch adds proper support for the 'status' property:
1. Implement filter_redirector_status_changed() callback:
- When status changes to 'off': remove chardev read handlers
- When status changes to 'on': re-register chardev handlers
(only if chardev is already open)
2. Update filter_redirector_setup() to respect initial status:
- If filter is created with status=off, do not register handlers
- This allows creating disabled filters via command line or QMP
3. Handle chardev OPENED/CLOSED events to re-arm handlers on reconnect:
- Keep the chr_event callback installed on CLOSE so a later OPENED
can re-register the read handlers when nf->on
- Use qemu_chr_fe_set_handlers_full(..., set_open=false, sync_state=false)
instead of qemu_chr_fe_set_handlers() because the latter forces
sync_state=true and may emit CHR_EVENT_OPENED for an already-open
backend. Doing that from inside the chr_event callback would cause
recursive/re-entrant OPENED handling.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Previously, when the 'interval' property was modified at runtime via
QMP, the new value would only take effect after the current timer
period elapsed. This could lead to unexpected behavior when users
expect immediate changes.
Fix this by checking if the timer is already running when setting
the interval property. If so, reschedule the timer with the new
interval value immediately.
Reviewed-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Keep NetdevTapOptions related logic in tap.c, and make tap_set_sndbuf a
simple system call wrapper, more like other functions in tap-linux.c
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Follow common recommendations in include/qapi/error.h of having
a return value together with errp. This allows to avoid error propagation.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Signed-off-by: Jason Wang <jasowang@redhat.com>
No reason to setup notifier on each queue of multique tap,
when we actually want to run downscript only once.
As well, let's not setup notifier, when downscript is
not enabled (downsciprt="no").
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Simplify handling scripts: parse all these "no" and '\0' once, and
then keep simpler logic for net_tap_open() and net_init_tap_one(): NULL
means no script to run, otherwise run script.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Directly pass NULL in cases where we report an error if script or
downscript are set.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Let's keep all similar argument checking in net_init_tap() function.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This patch removes abort() call in the tap_fd_set_vnet_hdr_len()
function. If the fd is found to be in a bad state (e.g., EBADFD
or ENODEV), the function will print an error message.
When QEMU creates a tap device automatically and the tap device is
manually removed from the host while the guest is running, the tap
device file descriptor becomes invalid. Later, when the guest executes
shutdown, the tap_fd_set_vnet_hdr_len() function may be called and
abort QEMU with a core dump when attempting to use the invalid fd.
The expected behavior for this negative test case is that QEMU should
report an error but continue running rather than aborting.
Testing:
- Start QEMU with automatically created tap device
- Manually remove the tap device on the host
- Execute shutdown in the guest
- Verify QEMU reports an error but does not abort
Fixes: 0caed25cd1 ("virtio: Call set_features during reset")
Signed-off-by: Houqi (Nick) Zuo <hzuo@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
All these files indirectly include the "qemu/bswap.h" header.
Make this inclusion explicit to avoid build errors when
refactoring unrelated headers.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20260109164742.58041-4-philmd@linaro.org>
Use error_setg_errno() instead of passing the value of strerror() or
g_strerror() to error_setg().
The separator between the error message proper and the value of
strerror() changes from " : ", "", " - ", "- " to ": " in places.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20251121121438.1249498-14-armbru@redhat.com>
Acked-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
A few error messages show numeric errno codes. Use error_setg_errno()
to show human-readable text instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20251121121438.1249498-13-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
[Trivial fixup to riscv_kvm_cpu_finalize_features()]
This error reports failure to create a temporary file, and
error_setg_file_open() would probably be too terse, so merely switch
to error_setg_errno() to add errno information.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20251121121438.1249498-12-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Replace
error_setg_errno(errp, errno, MSG, FNAME);
by
error_setg_file_open(errp, errno, FNAME);
where MSG is "Could not open '%s'" or similar.
Also replace equivalent uses of error_setg().
A few messages lose prefixes ("net dump: ", "SEV: ", __func__ ": ").
We could put them back with error_prepend(). Not worth the bother.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org>
Message-ID: <20251121121438.1249498-11-armbru@redhat.com>
[Conflict with commit 26b4a6ffe7 (monitor/hmp: Merge
hmp-cmds-target.c within hmp-cmds.c) resolved]
The error message changes from
tap: open vhost char device failed
to
Could not open '/dev/vhost-net': REASON
I think the exact file name is more useful to know than the file's
purpose.
We could put back the "tap: " prefix with error_prepend(). Not
worth the bother.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20251121121438.1249498-9-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Error messages change from
Can't open /dev/ip (actually /dev/udp)
Can't open /dev/tap
Can't open /dev/tap (2)
to
Could not open '/dev/udp': REASON
Could not open '/dev/tap': REASON
where REASON is the value of strerror(errno).
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org>
Message-ID: <20251121121438.1249498-5-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
AIO_WAIT_WHILE is used even outside the block layer; move the header file
out of block/ just like the implementation is in util/.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Avoid including all of qdev everywhere (the hw/core/qdev.h header in fact
brings in a lot more headers too), instead declare a couple structs for
which only a pointer type is needed.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In commits like 969e50b61a ("net: Pad short frames to minimum size
before sending from SLiRP/TAP") we switched away from requiring
network devices to handle short frames to instead having the net core
code do the padding of short frames out to the ETH_ZLEN minimum size.
We then dropped the code for handling short frames from the network
devices in a series of commits like 140eae9c8f ("hw/net: e1000:
Remove the logic of padding short frames in the receive path").
This missed one route where the device's receive code can still see a
short frame: if the device is in loopback mode and it transmits a
short frame via the qemu_receive_packet() function, this will be fed
back into its own receive code without being padded.
Add the padding logic to qemu_receive_packet().
This fixes a buffer overrun which can be triggered in the
e1000_receive_iov() logic via the loopback code path.
Other devices that use qemu_receive_packet() to implement loopback
are cadence_gem, dp8393x, lan9118, msf2-emac, pcnet, rtl8139
and sungem.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3043
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Makes the net_hub_port_cleanup function idempotent to avoid double
removals by guarding its QLIST_REMOVE with a flag.
When using a Xen networking device with hubport backends, e.g.:
-accel kvm,xen-version=0x40011
-netdev hubport,...
-device xen-net-device,...
the shutdown order starts with net_cleanup, which walks the list and
deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
called, which eventually leads to a second net_hub_port_cleanup call,
resulting in a segfault.
Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The actual backend is "Chardev", CharBackend is the frontend side of
it (whatever talks to the backend), let's rename it for readability.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Link: https://lore.kernel.org/r/20251022074612.1258413-1-marcandre.lureau@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It was deprecated in 9.2, time to remove.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This patch adds the ability to map a host unix socket to a guest tcp socket when
using the slirp backend. This feature was added in libslirp version 4.7.0.
A new syntax for unix socket: -hostfwd=unix:hostpath-[guestaddr]:guestport
Signed-off-by: Viktor Kurilko <murlockkinght@gmail.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-ID: <20250808143904.363907-1-murlockkinght@gmail.com>
When any host or guest GSO over UDP tunnel offload is enabled the
virtio net header includes the additional tunnel-related fields,
update the size accordingly.
Push the GSO over UDP tunnel offloads all the way down to the tap
device extending the newly introduced NetFeatures struct, and
eventually enable the associated features.
As per virtio specification, to convert features bit to offload bit,
map the extended features into the reserved range.
Finally, make the vhost backend aware of the exact header layout, to
copy it correctly. The tunnel-related field are present if either
the guest or the host negotiated any UDP tunnel related feature:
add them to the kernel supported features list, to allow qemu
transfer to the backend the needed information.
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <093b4bc68368046bffbcab2202227632d6e4e83b.1758549625.git.pabeni@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tap devices support GSO over UDP tunnel offload. Probe for such
feature in a similar manner to other offloads.
GSO over UDP tunnel needs to be enabled in addition to a "plain"
offload (TSO or USO).
No need to check separately for the outer header checksum offload:
the kernel is going to support both of them or none.
The new features are disabled by default to avoid compat issues,
and could be enabled, after that hw_compat_10_1 will be added,
together with the related compat entries.
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <a987a8a7613cbf33bb2209c7c7f5889b512638a7.1758549625.git.pabeni@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The set_offload() argument list is already pretty long and
we are going to introduce soon a bunch of additional offloads.
Replace the offload arguments with a single struct and update
all the relevant call-sites.
No functional changes intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <a9d4dd043b8c71b791e9ff05e17ef06072d9714e.1758549625.git.pabeni@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
net_slirp_register_poll_sock() and net_slirp_unregister_poll_sock()
report WSAEventSelect() failure with error_setg(&error_warn, ...).
error_setg_win32(&error_warn, ...) is undesirable just like
error_setg(&error_fatal, ...) and error_setg(&error_abort, ...) are.
Replace by warn_report().
The failures should probably be errors, but these functions implement
callbacks that cannot fail, exit(1) would be too harsh, and silent
failure we don't want. Thus, warnings.
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <20250923091000.3180122-7-armbru@redhat.com>
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Instead of open-coded g_unix_set_fd_nonblocking() calls, use
QEMU wrapper qemu_set_blocking().
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
[DB: fix missing closing ) in tap-bsd.c, remove now unused GError var]
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Now we can use qemu_set_blocking() in these cases.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Use common qemu_set_blocking() instead.
Note that pre-patch the behavior of Win32 and Linux realizations
are inconsistent: we ignore failure for Win32, and assert success
for Linux.
How do we convert the callers?
1. Most of callers call qemu_socket_set_nonblock() on a
freshly created socket fd, in conditions when we may simply
report an error. Seems correct switching to error handling
both for Windows (pre-patch error is ignored) and Linux
(pre-patch we assert success). Anyway, we normally don't
expect errors in these cases.
Still in tests let's use &error_abort for simplicity.
What are exclusions?
2. hw/virtio/vhost-user.c - we are inside #ifdef CONFIG_LINUX,
so no damage in switching to error handling from assertion.
3. io/channel-socket.c: here we convert both old calls to
qemu_socket_set_nonblock() and qemu_socket_set_block() to
one new call. Pre-patch we assert success for Linux in
qemu_socket_set_nonblock(), and ignore all other errors here.
So, for Windows switch is a bit dangerous: we may get
new errors or crashes(when error_abort is passed) in
cases where we have silently ignored the error before
(was it correct in all such cases, if they were?) Still,
there is no other way to stricter API than take
this risk.
4. util/vhost-user-server - compiled only for Linux (see
util/meson.build), so we are safe, switching from assertion to
&error_abort.
Note: In qga/channel-posix.c we use g_warning(), where g_printerr()
would actually be a better choice. Still let's for now follow
common style of qga, where g_warning() is commonly used to print
such messages, and no call to g_printerr(). Converting everything
to use g_printerr() should better be another series.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Coverity reported a file descriptor leak (CID 1490785) that happens if
`vhost_vdpa_get_max_queue_pairs()` returns 0, since in that case
net_host_vdpa_init(), which should take ownership of the fd, is never
called.
vhost_vdpa_get_max_queue_pairs() returns 1 if VIRTIO_NET_F_MQ is not
negotiated, or a negative error if the ioctl() fails, or the maximum
number of queue pairs exposed by the device in the config space in the
`max_virtqueue_pairs` field. In the VIRTIO spec we have:
The device MUST set max_virtqueue_pairs to between 1 and 0x8000
inclusive, if it offers VIRTIO_NET_F_MQ.
So, if `vhost_vdpa_get_max_queue_pairs()` returns 0, it's really an
error since the device is violating the VIRTIO spec.
Treat also `queue_pairs == 0` as an error, and jump to the `err` label,
to return a negative value to the caller in any case.
Coverity: CID 1490785
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20250714101156.30024-1-sgarzare@redhat.com>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Acked-by: Jason Wang <jasowang@redhat.com>
The "err" variable was declared but never used within the
chr_closed_bh() function. This resulted in a dead code
warning (CID 1612365) from Coverity.
Remove the unused variable and the associated error block
to resolve the issue.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This was flagged by Coverity as a memory illegal access.
Initialize the pointer to NULL at declaration.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
If g_remove() fails, use warn_report() to log an error.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
In passt_vhost_user_start(), if vhost_net_init() fails, the "net"
variable is NULL and execution jumps to the "err:" label.
The cleanup code within this label is conditioned on "if (net)",
which can never be true in this error case. This makes the cleanup
block dead code, as reported by Coverity (CID 1612371).
Refactor the error handling to occur inline, removing the goto and
the unreachable cleanup block.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The "err" variable was declared but never used within the
net_vhost_user_event() function. This resulted in a dead code
warning (CID 1612372) from Coverity.
Remove the unused variable and the associated error block
to resolve the issue.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The "err" variable was declared but never used within the
passt_vhost_user_event() function. This resulted in a dead code
warning (CID 1612375) from Coverity.
Remove the unused variable and the associated error block
to resolve the issue.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
net_init_tap intends to return 0 for success and -1 on error. However,
when net_init_tap() succeeds for a multi-queue device, it returns 1,
because of this code where ret becomes 1 when g_unix_set_fd_nonblocking
succeeds:
ret = g_unix_set_fd_nonblocking(fd, true, NULL);
if (!ret) {
... error ...
free_fail:
...
return ret;
Luckily, the only current call site checks for negative, rather than non-zero:
net_client_init1()
if (net_client_init_fun[](...) < 0)
Also, in the unlikely case that g_unix_set_fd_nonblocking fails and returns
false, ret=0 is returned, and net_client_init1 will use a broken interface.
Fix it to be future proof.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Theoretically tap_read_packet() may return size less than
s->host_vnet_hdr_len, and next, we'll work with negative size
(in case of !s->using_vnet_hdr). Let's avoid it.
Don't proceed with size == s->host_vnet_hdr_len as well in case
of !s->using_vnet_hdr, it doesn't make sense.
Tested-by: Lei Yang <leiyang@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Extend 'inhibit=on' setting with the option to specify a pinned XSK map
path along with a starting index (default 0) to push the created XSK
sockets into. Example usage:
# ./build/qemu-system-x86_64 [...] \
-netdev af-xdp,ifname=enp2s0f0np0,id=net0,mode=native,queues=2,start-queue=14,inhibit=on,map-path=/sys/fs/bpf/xsks_map,map-start-index=14 \
-device virtio-net-pci,netdev=net0 [...]
This is useful for the case where an existing XDP program with XSK map
is present on the AF_XDP supported phys device and the XSK map is not
yet populated. For example, the former could have been pre-loaded onto
the netdevice by a control plane, which later launches QEMU to populate
it with XSK sockets.
Normally, the main idea behind 'inhibit=on' is that the QEMU instance
doesn't need to have a lot of privileges to use the pre-loaded program
and the pre-created sockets, but this mentioned use-case here is different
where QEMU still needs privileges to create the sockets.
The 'map-start-index' parameter is optional and defaults to 0. It allows
flexible placement of the XSK sockets, and is up to the user to specify
when the XDP program with XSK map was already preloaded. In the simplest
case the queue-to-map-slot mapping is just 1:1 based on ctx->rx_queue_index
but the user might as well have a different scheme (or smaller map size,
e.g. ctx->rx_queue_index % max_size) to push the inbound traffic to one
of the XSK sockets.
Note that the bpf_xdp_query_id() is now only tested for 'inhibit=off'
since only in the latter case the libxdp takes care of installing the
XDP program which was installed based on the s->xdp_flags pointing to
either driver or skb mode. For 'inhibit=on' we don't make any assumptions
and neither go down the path of probing all possible options in which
way the user installed the XDP program.
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ilya Maximets <i.maximets@ovn.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
While testing, it turned out that upon error in the queue creation loop,
we never trigger the af_xdp_cleanup() handler. This is because we pass
errp instead of a local err pointer into the various AF_XDP setup functions
instead of a scheme like:
bool fn(..., Error **errp)
{
Error *err = NULL;
foo(arg, &err);
if (err) {
handle the error...
error_propagate(errp, err);
return false;
}
...
}
The same is true for the attachment probing with bpf_xdp_query_id(). With a
conversion into the above format, the af_xdp_cleanup() handler is called as
expected. Note the error_propagate() handles a NULL err internally.
Fixes: cb039ef3d9 ("net: add initial support for AF_XDP network backend")
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ilya Maximets <i.maximets@ovn.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
There are two issues with the XDP program removal in af_xdp_cleanup():
1) Starting from libxdp 1.3.0 [0] the XDP program gets automatically
detached when we call xsk_socket__delete() for the last successfully
configured queue. libxdp internally keeps track of that. For QEMU
we require libxdp >= 1.4.0. Given QEMU is not loading the program,
lets also not attempt to remove it and delegate this instead.
2) The removal logic is incorrect anyway because we are setting n_queues
into the last queue that never has xdp_flags on failure, so the logic
is always skipped since the non-zero test for s->xdp_flags in
af_xdp_cleanup() fails.
Fixes: cb039ef3d9 ("net: add initial support for AF_XDP network backend")
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Cc: Ilya Maximets <i.maximets@ovn.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Anton Protopopov <aspsk@isovalent.com>
Link: 38c2914988 [0]
Signed-off-by: Jason Wang <jasowang@redhat.com>
It is no longer used.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250530-vdpa-v1-5-5af4109b1c19@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Retrieve peer hashing capability instead of hardcoding.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250530-vdpa-v1-4-5af4109b1c19@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Report hashing capability so that virtio-net can deliver the correct
capability information to the guest.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250530-vdpa-v1-2-5af4109b1c19@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>