This will be used to access non-secure and secure memory. Secure support
and Granule Protection Check (for RME) for SMMU need to access secure
memory.
As well, it allows to remove usage of global address_space_memory,
allowing different SMMU instances to have a specific view of memory.
User creatable SMMU are handled as well for virt machine,
by setting the memory properties when device is plugged in.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20260108210453.2280733-1-pierrick.bouvier@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Now there are no callsites for the omap_badwidth_* family
of functions we can remove their implementations.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Complete the conversion started in the previous commit by
changing all the omap_badwidth_write* calls to open-coded
log-and-ignore behaviour.
We can delete a FIXME comment about an infinite loop, because that
only looped infinitely back before 2011 when the device was still
using absolute physical addresses. Now that we are simply logging
the error we can clearly see that there's no loop.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The omap_badwidth_read* and omap_badwidth_write* functions are
used by various OMAP devices when the guest makes an access
to registers with an invalid width; they do two things:
- log a GUEST_ERROR for the access
- call cpu_physical_memory_read() or cpu_physical_memory_write()
with the offset they are passed in
The first of these produces an unhelpful log message because the
function name that is printed is that of the omap-badwidth_*
function, not that of the read or write function of the device that
called it; this means you can't tell what device is involved.
The second is wrong because the offset is an offset into the device
but we use it as an absolute physical address, so we will access
whatever is at low memory. That happens to be the boot ROM, so we
will ignore a write and return random garbage on a read. This bug
has been present since 2011, when we did the conversions to the
MemoryRegion APIs, which involved changing all devices from working
with absolute physical addresses to working with offsets within their
MemoryRegions. We must have missed updating these functions.
Replace the uses of the omap_badwidth_read* functions in omap1.c with
an open-coded call to qemu_log_mask() and RAZ/WI behaviour.
We do just the reads here because there are a lot of callsites in
omap1.c; the writes will be done in the next commit.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The omap_badwidth_read* and omap_badwidth_write* functions are
used by various OMAP devices when the guest makes an access
to registers with an invalid width; they do two things:
- log a GUEST_ERROR for the access
- call cpu_physical_memory_read() or cpu_physical_memory_write()
with the offset they are passed in
The first of these produces an unhelpful log message because the
function name that is printed is that of the omap-badwidth_*
function, not that of the read or write function of the device that
called it; this means you can't tell what device is involved.
The second is wrong because the offset is an offset into the device
but we use it as an absolute physical address, so we will access
whatever is at low memory. That happens to be the boot ROM, so we
will ignore a write and return random garbage on a read. This bug
has been present since 2011, when we did the conversions to the
MemoryRegion APIs, which involved changing all devices from working
with absolute physical addresses to working with offsets within their
MemoryRegions. We must have missed updating these functions.
Replace the uses of these functions in omap_dma.c with an
open-coded call to qemu_log_mask() and RAZ/WI behaviour.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The omap_badwidth_read* and omap_badwidth_write* functions are
used by various OMAP devices when the guest makes an access
to registers with an invalid width; they do two things:
- log a GUEST_ERROR for the access
- call cpu_physical_memory_read() or cpu_physical_memory_write()
with the offset they are passed in
The first of these produces an unhelpful log message because the
function name that is printed is that of the omap-badwidth_*
function, not that of the read or write function of the device that
called it; this means you can't tell what device is involved.
The second is wrong because the offset is an offset into the device
but we use it as an absolute physical address, so we will access
whatever is at low memory. That happens to be the boot ROM, so we
will ignore a write and return random garbage on a read. This bug
has been present since 2011, when we did the conversions to the
MemoryRegion APIs, which involved changing all devices from working
with absolute physical addresses to working with offsets within their
MemoryRegions. We must have missed updating these functions.
Replace the uses of these functions in omap_gpio.c with an
open-coded call to qemu_log_mask() and RAZ/WI behaviour.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The omap_badwidth_read* and omap_badwidth_write* functions are
used by various OMAP devices when the guest makes an access
to registers with an invalid width; they do two things:
- log a GUEST_ERROR for the access
- call cpu_physical_memory_read() or cpu_physical_memory_write()
with the offset they are passed in
The first of these produces an unhelpful log message because the
function name that is printed is that of the omap-badwidth_*
function, not that of the read or write function of the device that
called it; this means you can't tell what device is involved.
The second is wrong because the offset is an offset into the device
but we use it as an absolute physical address, so we will access
whatever is at low memory. That happens to be the boot ROM, so we
will ignore a write and return random garbage on a read. This bug
has been present since 2011, when we did the conversions to the
MemoryRegion APIs, which involved changing all devices from working
with absolute physical addresses to working with offsets within their
MemoryRegions. We must have missed updating these functions.
Replace the uses of these functions in omap_i2c.c with an
open-coded call to qemu_log_mask() and RAZ/WI behaviour.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The omap_badwidth_read* and omap_badwidth_write* functions are
used by various OMAP devices when the guest makes an access
to registers with an invalid width; they do two things:
- log a GUEST_ERROR for the access
- call cpu_physical_memory_read() or cpu_physical_memory_write()
with the offset they are passed in
The first of these produces an unhelpful log message because the
function name that is printed is that of the omap-badwidth_*
function, not that of the read or write function of the device that
called it; this means you can't tell what device is involved.
The second is wrong because the offset is an offset into the device
but we use it as an absolute physical address, so we will access
whatever is at low memory. That happens to be the boot ROM, so we
will ignore a write and return random garbage on a read. This bug
has been present since 2011, when we did the conversions to the
MemoryRegion APIs, which involved changing all devices from working
with absolute physical addresses to working with offsets within their
MemoryRegions. We must have missed updating these functions.
Replace the uses of these functions in omap_mmc.c with an
open-coded call to qemu_log_mask() and RAZ/WI behaviour.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
By removing cpu details and use a config struct, we can use the
same granule_protection_check with other devices, like SMMU.
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20251216000122.763264-3-pierrick.bouvier@linaro.org
[PMM: avoid local vars in middle of block]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The ARMSecuritySpace enum and its related helpers were defined in the
target-specific header target/arm/cpu.h. This prevented common,
target-agnostic code like the SMMU model from using these definitions
without triggering "cpu.h included from common code" errors.
To resolve this, this commit introduces a new, lightweight header,
include/hw/arm/arm-security.h, which is safe for inclusion by common
code.
The following change was made:
- The ARMSecuritySpace enum and the arm_space_is_secure() and
arm_secure_to_space() helpers have been moved from target/arm/cpu.h
to the new hw/arm/arm-security.h header.
This refactoring decouples the security state definitions from the core
CPU implementation, allowing common hardware models to correctly handle
security states without pulling in heavyweight, target-specific headers.
Signed-off-by: Tao Tang <tangtao1634@phytium.com.cn>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20251216000122.763264-2-pierrick.bouvier@linaro.org
Link: https://lists.nongnu.org/archive/html/qemu-arm/2025-09/msg01288.html
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Unfortunately while rebasing the series registering the
ARM/Aarch64 machine interfaces and getting it merged as
commit 38c5ab4003 ("hw/arm: Filter machine types for
qemu-system-arm/aarch64 binaries") we missed the recent
addition of the MAX78000FTHR machine in commit 51eb283dd0.
Correct that.
The effect is that the machine was accidentally disabled.
Cc: qemu-stable@nongnu.org
Reported-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Tested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20251218214306.63667-1-philmd@linaro.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3248
Fixes: 38c5ab4003 ("hw/arm: Filter machine types for single binary")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
I don't want to admit how many hours I spent trying to figure out why
nothing was being printed (as the enable-ing code hadn't yet run,
even thought it existed).
Signed-off-by: julia <midnight@trainwit.ch>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20251217-cmsdk-uart-disabled-warning2-v1-1-847de48840bc@trainwit.ch
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Test for presence of ASID2; if it is, check FNG1, FNG0, and A2 are
writable, and read value shows the update. If not present, check these
read as RES0.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Jim MacArthur <jim.macarthur@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This just allows read/write of three feature bits. ASID is still
ignored. Any writes to TTBR0_EL0 and TTBR1_EL0, including changing
the ASID, will still cause a complete flush of the TLB.
Signed-off-by: Jim MacArthur <jim.macarthur@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Gustavo Romero <gustavo.romero@linaro.org>
Signed-off-by: Jim MacArthur <jim.macarthur@linaro.org>
[PMM: add entry to v8_user_idregs[] list also]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
hw/arm/boot.h is included twice
Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* Resolves build errors with gcc 16
* Adjusts the Linux headers for s390x and mshv
* Fixes endianness issue in the VFIO helper functions
* Adds support for live migration with vIOMMU when using IOMMU
dirty tracking
* Implements a migration blocker to prevent failures when VM
memory is too large
* Corrects an unmap_bitmap failure in the legacy VFIO backend
* Addresses a workaround for an Intel IOMMU errata.
* Implements Intel IOMMU first stage translation for passthrough
device. Also a prerequisite work for vSVA.
* Updates documentation
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmlmEgQACgkQUaNDx8/7
7KFm2w/+JwlyiY5jWjzvCBCEEgBdBrb8XzMoSFr2xWNQrNHvE23veeQJcT+5LwQI
DV74Y3wmWYeAVGGKHVoALVEIJYtjVDOPU5TIyhr4nTMO8/A2j1ylBhsP6ZnWYYkO
uFe92O3wTHViFY5h9dgm1JsA3Bok52mteAHAE5gsxCNYk6h+ps1a5UZM8wxjtNA2
yVIvAZvaubnA/0yN02pz5bCOhPpaGpkV69l7nJSHwk2RPuspUR6dWo11P2yjxVDQ
7pv7DbLl9qm+xdmOp0ANVPKp9fqBJnBa/ta1Dn1VrQ2iJXnwezy+IdNC1In/HKKy
ZHe+V/p2JA09xjjmB2fu53DQQIjh/qeCWi0b2vkDZZVvl0hJ+0y9P1GRxhwBhtgK
/vwvgKGwC3OwXcdrxXNvD4Yy4NJLUtCoN8vmyI41ohLeMfr7/XrmTrf0J4ciPc4T
1bAHY2SWkFL59ylN+gt1khlV8zqPYP9S1i08A2wJjvLOwqRJ/LN2tNEh9pWvGmFg
p5WGTNeZLsfD+ZT10bm083EMAc1va7RTQNjAzb55pxq0ASPl7ZIVAKqazaG9QsaK
apPxGGYevuWzJVaNYWAqj7y37WDP/w6rKmyRmIBMV+x9+Dv+DGPbGb8oAOjZ0Av5
489mHYIONxp//2SvaUSfGpQHACCgEKTHlstdlyw79C84xPzHujE=
=o9aW
-----END PGP SIGNATURE-----
Merge tag 'pull-vfio-20260113' of https://github.com/legoater/qemu into staging
vfio queue:
* Resolves build errors with gcc 16
* Adjusts the Linux headers for s390x and mshv
* Fixes endianness issue in the VFIO helper functions
* Adds support for live migration with vIOMMU when using IOMMU
dirty tracking
* Implements a migration blocker to prevent failures when VM
memory is too large
* Corrects an unmap_bitmap failure in the legacy VFIO backend
* Addresses a workaround for an Intel IOMMU errata.
* Implements Intel IOMMU first stage translation for passthrough
device. Also a prerequisite work for vSVA.
* Updates documentation
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmlmEgQACgkQUaNDx8/7
# 7KFm2w/+JwlyiY5jWjzvCBCEEgBdBrb8XzMoSFr2xWNQrNHvE23veeQJcT+5LwQI
# DV74Y3wmWYeAVGGKHVoALVEIJYtjVDOPU5TIyhr4nTMO8/A2j1ylBhsP6ZnWYYkO
# uFe92O3wTHViFY5h9dgm1JsA3Bok52mteAHAE5gsxCNYk6h+ps1a5UZM8wxjtNA2
# yVIvAZvaubnA/0yN02pz5bCOhPpaGpkV69l7nJSHwk2RPuspUR6dWo11P2yjxVDQ
# 7pv7DbLl9qm+xdmOp0ANVPKp9fqBJnBa/ta1Dn1VrQ2iJXnwezy+IdNC1In/HKKy
# ZHe+V/p2JA09xjjmB2fu53DQQIjh/qeCWi0b2vkDZZVvl0hJ+0y9P1GRxhwBhtgK
# /vwvgKGwC3OwXcdrxXNvD4Yy4NJLUtCoN8vmyI41ohLeMfr7/XrmTrf0J4ciPc4T
# 1bAHY2SWkFL59ylN+gt1khlV8zqPYP9S1i08A2wJjvLOwqRJ/LN2tNEh9pWvGmFg
# p5WGTNeZLsfD+ZT10bm083EMAc1va7RTQNjAzb55pxq0ASPl7ZIVAKqazaG9QsaK
# apPxGGYevuWzJVaNYWAqj7y37WDP/w6rKmyRmIBMV+x9+Dv+DGPbGb8oAOjZ0Av5
# 489mHYIONxp//2SvaUSfGpQHACCgEKTHlstdlyw79C84xPzHujE=
# =o9aW
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 13 Jan 2026 08:36:04 PM AEDT
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [full]
# gpg: aka "Cédric Le Goater <clg@kaod.org>" [full]
* tag 'pull-vfio-20260113' of https://github.com/legoater/qemu: (41 commits)
tests/rcutorture: Fix build error
tests/qtest: Fix build error
target/riscv: Fix build errors
ppc/vof: Fix build error
update-linux-headers: Remove "asm-s390/unistd_32.h"
include/hw/hyperv: Remove unused 'struct mshv_vp_registers' definition
util/vfio-helper: Fix endianness in PCI config read/write functions
Workaround for ERRATA_772415_SPR17
vfio/listener: Bypass readonly region for dirty tracking
intel_iommu_accel: Implement get_host_iommu_quirks() callback
hw/pci: Introduce pci_device_get_host_iommu_quirks()
vfio/migration: Allow live migration with vIOMMU without VFs using device dirty tracking
vfio/migration: Add migration blocker if VM memory is too large to cause unmap_bitmap failure
vfio/listener: Add missing dirty tracking in region_del
intel_iommu: Fix unmap_bitmap failure with legacy VFIO backend
vfio/iommufd: Add IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag support
vfio: Add a backend_flag parameter to vfio_container_query_dirty_bitmap()
vfio/container-legacy: rename vfio_dma_unmap_bitmap() to vfio_legacy_dma_unmap_get_dirty_bitmap()
vfio/iommufd: Query dirty bitmap before DMA unmap
vfio/iommufd: Add framework code to support getting dirty bitmap before unmap
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
VEX is only forbidden in real and vm86 mode; 16-bit protected mode supports
it for some unfathomable reason.
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
VSIB can have either 32-bit or 64-bit addresses, pass a constant mask to
the helper and apply it before the load.
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the vex_special field was not initialized, it was considered to be
X86_VEX_SSEUnaligned (whose value was zero). Add a new value to
fix that.
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove dead code; it arose when I noticed that, because 0x3? opcodes do
have a pop, case 0x32 works just fine as fcomp (even though 0x?2 is fcom):
there is no need to hack the op to 0x03.
Reported by Coverity as CID 1643922.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The value that is pushed by PUSHF is the full EFLAGS, while CC_OP_EFLAGS
only wants arithmetic flags in CC_SRC. To avoid this, follow what other
helpers do and set CC_SRC/CC_OP directly in helper_read_eflags. This
is basically free and fixes an issue booting Windows 3.11.
Reported-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Fixes: e661e2d7a3 ("target/i386/tcg: update cc_op after PUSHF", 2025-12-27)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Newer gcc compiler (version 16.0.0 20260103 (Red Hat 16.0.0-0) (GCC))
detects an unused variable error:
../tests/unit/rcutorture.c: In function ‘rcu_read_stress_test’:
../tests/unit/rcutorture.c:251:18: error: variable ‘garbage’ set but not used [-Werror=unused-but-set-variable=]
251 | volatile int garbage = 0;
| ^~~~~~~
Since the 'garbage' variable is used to generate memory reads from the
CPU while holding the RCU lock, it can not be removed. Tag it as
((unused)) instead to silence the compiler warnings/errors.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20260112163350.1251114-1-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Newer gcc compiler (version 16.0.0 20260103 (Red Hat 16.0.0-0) (GCC))
detects an unused variable error:
../tests/qtest/libqtest.c: In function ‘qtest_qom_has_concrete_type’:
../tests/qtest/libqtest.c:1044:9: error: variable ‘idx’ set but not used [-Werror=unused-but-set-variable=]
Remove idx.
Cc: Fabiano Rosas <farosas@suse.de>
Cc: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20260112123146.1010621-1-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Newer gcc compiler (version 16.0.0 20260103 (Red Hat 16.0.0-0) (GCC))
detects a truncation error:
../target/riscv/cpu.c: In function ‘riscv_isa_write_fdt’:
../target/riscv/cpu.c:2916:35: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 5 [-Werror=format-truncation=]
2916 | snprintf(isa_base, maxlen, "rv%di", xlen);
| ^~
../target/riscv/cpu.c:2916:32: note: directive argument in the range [-2147483648, 2147483632]
2916 | snprintf(isa_base, maxlen, "rv%di", xlen);
| ^~~~~~~
Since the xlen variable represents the register width (32, 64, 128) in
the RISC-V base ISA name, mask its value with a 8-bit bitmask to
satisfy the size constraints on the snprintf output.
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Alistair Francis <alistair.francis@wdc.com>
Cc: Weiwei Li <liwei1518@gmail.com>
Cc: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Cc: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Daniel Henrique Barboza <daniel.barboza@oss.qualcomm.com>
Link: https://lore.kernel.org/qemu-devel/20260112161626.1232639-1-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Newer gcc compiler (version 16.0.0 20260103 (Red Hat 16.0.0-0) (GCC))
detects an unused variable error:
../hw/ppc/vof.c: In function ‘vof_dt_memory_available’:
../hw/ppc/vof.c:642:12: error: variable ‘n’ set but not used [-Werror=unused-but-set-variable=]
Remove 'n'.
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20260112124722.1029212-1-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The "asm/unistd_32.h" file was generated for the 31-bit compatibility
mode on the s390 architecture and support was removed in v6.19-rc1,
commit 4ac286c4a8d9 ("s390/syscalls: Switch to generic system call
table generation")
unistd_32.h is no longer generated when running make header_install.
Remove it.
Reported-by: Shameer Kolothum <skolothumtho@nvidia.com>
Cc: Thomas Huth <thuth@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260112155341.1209988-1-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The 'struct mshv_vp_registers' definition in hvgdk_mini.h is unused in
QEMU and conflicts with the canonical definition in
linux-headers/linux/mshv.h.
Remove the duplicate definition to avoid build conflicts when the Linux
headers are updated.
Cc: Magnus Kulke <magnuskulke@linux.microsoft.com>
Reviewed-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/qemu-devel/20260108185012.2568277-1-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The VFIO pread/pwrite functions use little-endian data format. Currently, the
qemu_vfio_pci_read_config() and qemu_vfio_pci_write_config() don't correctly
convert from CPU native endian format to little-endian (and vice versa) when
using the pread/pwrite functions. Fix this by limiting read/write to 32 bits
and handling endian conversion in qemu_vfio_pci_read_config() and
qemu_vfio_pci_write_config().
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260105222029.2423-1-alifm@linux.ibm.com
[ clg: Fixed typo in subject ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
On a system influenced by ERRATA_772415, IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17
is repored by IOMMU_DEVICE_GET_HW_INFO. Due to this errata, even the readonly
range mapped on second stage page table could still be written.
Reference from 4th Gen Intel Xeon Processor Scalable Family Specification
Update, Errata Details, SPR17.
Link https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/eagle-stream/sapphire-rapids-specification-update/
Backup https://cdrdv2.intel.com/v1/dl/getContent/772415
Also copied the SPR17 details from above link:
"Problem: When remapping hardware is configured by system software in
scalable mode as Nested (PGTT=011b) and with PWSNP field Set in the
PASID-table-entry, it may Set Accessed bit and Dirty bit (and Extended
Access bit if enabled) in first-stage page-table entries even when
second-stage mappings indicate that corresponding first-stage page-table
is Read-Only.
Implication: Due to this erratum, pages mapped as Read-only in second-stage
page-tables may be modified by remapping hardware Access/Dirty bit updates.
Workaround: None identified. System software enabling nested translations
for a VM should ensure that there are no read-only pages in the
corresponding second-stage mappings."
Introduce a helper vfio_device_get_host_iommu_quirk_bypass_ro to check if
readonly mappings should be bypassed.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/20260106062808.316574-5-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
When doing dirty tracking or calculating dirty tracking range, readonly
regions can be bypassed, because corresponding DMA mappings are readonly
and never become dirty.
This can optimize dirty tracking a bit for passthrough device.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/20260106062808.316574-4-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Implement get_host_iommu_quirks() callback to retrieve the vendor specific
hardware information data and convert it into bitmaps defined with enum
host_iommu_quirks. It will be used by VFIO in subsequent patch.
Suggested-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/20260106062808.316574-3-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
In VFIO core, we call iommufd_backend_get_device_info() to return vendor
specific hardware information data, but it's not good to retrieve this raw
data in VFIO core.
Introduce a new PCIIOMMUOps optional callback, get_host_iommu_quirk() which
allows to retrieve the vendor specific hardware information data and convert
it into bitmaps defined with enum host_iommu_quirks.
pci_device_get_host_iommu_quirks() is a wrapper that can be called on a PCI
device potentially protected by a vIOMMU.
Suggested-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/20260106062808.316574-2-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Commit e46883204c ("vfio/migration: Block migration with vIOMMU")
introduces a migration blocker when vIOMMU is enabled, because we need
to calculate the IOVA ranges for device dirty tracking. But this is
unnecessary for iommu dirty tracking.
Limit the vfio_viommu_preset() check to those devices which use device
dirty tracking. This allows live migration with VFIO devices which use
iommu dirty tracking.
Suggested-by: Jason Zeng <jason.zeng@intel.com>
Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Rohith S R <rohith.s.r@intel.com>
Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-10-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
With default config, kernel VFIO IOMMU type1 driver limits dirty bitmap to
256MB for unmap_bitmap ioctl so the maximum guest memory region is no more
than 8TB size for the ioctl to succeed.
Be conservative here to limit total guest memory to max value supported
by unmap_bitmap ioctl or else add a migration blocker. IOMMUFD backend
doesn't have such limit, one can use it if there is a need to migrate such
large VM.
Suggested-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-9-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
If a VFIO device in guest switches from passthrough(PT) domain to block
domain, the whole memory address space is unmapped, but we passed a NULL
iotlb entry to unmap_bitmap, then bitmap query didn't happen and we lost
dirty pages.
By constructing an iotlb entry with iova = gpa for unmap_bitmap, it can
set dirty bits correctly.
For IOMMU address space, we still send NULL iotlb because VFIO don't know
the actual mappings in guest. It's vIOMMU's responsibility to send actual
unmapping notifications, e.g., vtd_address_space_unmap_in_dirty_tracking().
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-8-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
If a VFIO device in guest switches from IOMMU domain to block domain,
vtd_address_space_unmap() is called to unmap whole address space.
If that happens during migration, migration fails with legacy VFIO
backend as below:
Status: failed (vfio_container_dma_unmap(0x561bbbd92d90, 0x100000000000, 0x100000000000) = -7 (Argument list too long))
Because legacy VFIO limits maximum bitmap size to 256MB which maps to 8TB on
4K page system, when 16TB sized UNMAP notification is sent, unmap_bitmap
ioctl fails. Normally such large UNMAP notification come from IOVA range
rather than system memory.
Apart from that, vtd_address_space_unmap() sends UNMAP notification with
translated_addr = 0, because there is no valid translated_addr for unmapping
a whole iommu memory region. This breaks dirty tracking no matter which VFIO
backend is used.
Fix them all by iterating over DMAMap list to unmap each range with active
mapping when global_dirty_tracking is active. global_dirty_tracking is
protected by BQL, so it's safe to reference it directly. If it's not active,
unmapping the whole address space in one go is optimal.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Rohith S R <rohith.s.r@intel.com>
Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-7-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Pass IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR when doing the last dirty
bitmap query right before unmap, no PTEs flushes. This accelerates the
query without issue because unmap will tear down the mapping anyway.
Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Rohith S R <rohith.s.r@intel.com>
Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-6-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
This new parameter will be used in following patch, currently 0 is passed.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Rohith S R <rohith.s.r@intel.com>
Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-5-zhenzhong.duan@intel.com
[ clg: Fixed subject typo ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
This is to follow naming style in container-legacy.c to have low level functions
with vfio_legacy_ prefix.
No functional changes.
Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-4-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
When an existing mapping is unmapped, there could already be dirty bits
which need to be recorded before unmap.
If query dirty bitmap fails, we still need to do unmapping or else there
is stale mapping and it's risky to guest.
Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Rohith S R <rohith.s.r@intel.com>
Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-3-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Currently we support device and iommu dirty tracking, device dirty tracking
is preferred.
Add the framework code in iommufd_cdev_unmap() to choose either device or
iommu dirty tracking, just like vfio_legacy_dma_unmap_one().
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Rohith S R <rohith.s.r@intel.com>
Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-2-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Add documentation about using IOMMUFD backed VFIO device with intel_iommu with
x-flts=on.
Suggested-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260106061304.314546-20-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Now that all infrastructures of supporting passthrough device running
with first stage translation are there, enable it now.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260106061304.314546-19-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
When x-flts=on, we set up bindings to nested HWPT in host, after
migration, VFIO device binds to nesting parent HWPT by default.
We need to re-establish the bindings to nested HWPT, or else device
DMA will break.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20260106061304.314546-18-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>