Through the new 'confidential-guest-reset' property, control plane should be
able to detect if the hypervisor supports x86 confidential guest resets. Older
hypervisors that do not support resets will not have this property populated.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-35-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A new functional test is added that exercises the code changes related to
closing of the old KVM VM file descriptor and opening a new one upon VM reset.
This normally happens when confidential guests are reset but for
non-confidential guests, we use a special machine specific debug/test parameter
'x-change-vmfd-on-reset' to enable this behavior.
Only specific code changes related to re-initialisation of SEV-ES, SEV-SNP and
TDX platforms are not exercised in this test as they require hardware that
supports running confidential guests.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-34-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A new machine specific option 'x-change-vmfd-on-reset' is introduced for
debugging and testing only (hence the 'x-' prefix). This option when enabled
will force KVM VM file descriptor to be changed upon guest reset like
in the case of confidential guests. This can be used to exercise the code
changes that are specific for confidential guests on non-confidential
guests as well (except changes that require hardware support for
confidential guests).
A new functional test has been added in the next patch that uses this new
parameter to test the VM file descriptor changes.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-33-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Confidential guests change the KVM VM file descriptor upon reset and also create
new VCPU file descriptors against the new KVM VM file descriptor. We need to
save the clock state from kvm before KVM VM file descriptor changes and restore
it after. Also after VCPU file descriptors changed, we must call
KVM_KVMCLOCK_CTRL on the VCPU file descriptor to inform KVM that the VCPU is
in paused state.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-32-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When new vcpu file descriptors are created and bound to the new kvm file
descriptor as a part of the confidential guest reset mechanism, various
subsystems needs to know about it. This change adds notifiers so that various
subsystems can take appropriate actions when vcpu fds change by registering
their handlers to this notifier.
Subsequent changes will register specific handlers to this notifier.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-31-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For confidential guests during the reset process, the old KVM VM file
descriptor is closed and a new one is created. When a new file descriptor is
created, a new openpic device needs to be created against this new KVM VM file
descriptor as well. Additionally, existing memory region needs to be reattached
to this new openpic device and proper CPU attributes set associating new file
descriptor. This change makes this happen with the help of a callback handler
that gets called when the KVM VM file descriptor changes as a part of the
confidential guest reset process.
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-30-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On confidential guests KVM virtual machine file descriptor changes as a
part of the guest reset process. Xen capabilities needs to be re-initialized in
KVM against the new file descriptor.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-29-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On confidential guests when the KVM virtual machine file descriptor changes as
a part of the reset process, event file descriptors needs to be reassociated
with the new KVM VM file descriptor. This is achieved with the help of a
callback handler that gets called when KVM VM file descriptor changes during
the confidential guest reset process.
This patch is tested on non-confidential platform only.
Acked-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-28-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We need to make sure that synic CPU feature is not already enabled. If it is,
trying to enable it again will result in the following assertion:
Unexpected error in object_property_try_add() at ../qom/object.c:1268:
qemu-system-x86_64: attempt to add duplicate property 'synic' to object (type 'host-x86_64-cpu')
So enable synic only if its not enabled already.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-27-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A confidential guest reset involves closing the old virtual machine KVM file
descriptor and opening a new one. Since its a new KVM fd, PIT needs to be
re-initialized again. This is done with the help of a notifier which is invoked
upon KVM vm file descriptor change during the confidential guest reset process.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-26-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The initialization code will be used again by VM file descriptor change
notifier callback in a subsequent change. So refactor common code into a new
helper function.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-25-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Normally the vfio pseudo device file descriptor lives for the life of the VM.
However, when the kvm VM file descriptor changes, a new file descriptor
for the pseudo device needs to be generated against the new kvm VM descriptor.
Other existing vfio descriptors needs to be reattached to the new pseudo device
descriptor. This change performs the above steps.
Tested-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260227072445.406907-1-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When the KVM VM file descriptor changes as a part of the confidential guest
reset mechanism, it necessary to create a new confidential guest context and
re-encrypt the VM memory. This happens for SEV-ES and SEV-SNP virtual machines
as a part of SEV_LAUNCH_FINISH, SEV_SNP_LAUNCH_FINISH operations.
A new resettable interface for SEV module has been added. A new reset callback
for the reset 'exit' state has been implemented to perform the above operations
when the VM file descriptor has changed during VM reset.
Tracepoints has been added also for tracing purpose.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-23-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If there is existing launch update data and kernel hashes data, they need to be
freed when initialization code is executed. This is important for resettable
confidential guests where the initialization happens once every reset.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-22-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The various notifiers that are used needs to be installed only once not on
every initialization. This includes the vm state change notifier and others.
This change uses 'cgs->ready' flag to install the notifiers only one time,
the first time.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-21-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
sev_launch_finish() and sev_snp_launch_finish() could be called multiple times
when the confidential guest is being reset/rebooted. The migration
blockers should not be added multiple times, once per invocation. This change
makes sure that the migration blockers are added only one time by adding the
migration blockers to the vm state change handler when the vm transitions to
the running state. Subsequent reboots do not change the state of the vm.
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-20-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
During reset, when the VM file descriptor is changed, the TDX state needs to be
re-initialized. A notifier callback is implemented to reset the old
state and free memory before the new state is initialized post VM file
descriptor change.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-19-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When the confidential virtual machine KVM file descriptor changes due to the
guest reset, some TDX specific setup steps needs to be done again. This
includes finalizing the initial guest launch state again. This change
re-executes some parts of the TDX setup during the device reset phaze using a
resettable interface. This finalizes the guest launch state again and locks
it in. Machine done notifier which was previously used is no longer needed as
the same code is now executed as a part of VM reset.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-18-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A new helper function is introduced that refactors all firmware memory
initialization code into a separate function. No functional change.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-17-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Confidential guests needs to generate a new KVM file descriptor upon virtual
machine reset. Existing VCPUs needs to be reattached to this new
KVM VM file descriptor. As a part of this, new VCPU file descriptors against
this new KVM VM file descriptor needs to be created and re-initialized.
Resources allocated against the old VCPU fds needs to be released. This change
makes this happen.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-16-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When IGVM is not being used by the confidential guest, the guest firmware has
to be reloaded explicitly again into memory. This is because, the memory into
which the firmware was loaded before reset was encrypted and is thus lost
upon reset. When IGVM is used, it is expected that the IGVM will contain the
guest firmware and the execution of the IGVM directives will set up the guest
firmware memory.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-15-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Confidential guest smust reload their bios rom upon reset. This is because
bios memory is encrypted and upon reset, the contents of the old bios memory
is lost and cannot be re-used. To this end, export a new x86 function
x86_bios_rom_reload() to reload the bios again. This function will be used in
the subsequent patches.
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-14-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For confidential guests, bios image must be reinitialized upon reset. This
is because bios memory is encrypted and hence once the old confidential
kvm context is destroyed, it cannot be decrypted. It needs to be reinitilized.
Towards that, this change refactors x86_bios_rom_init() code so that
parts of it can be called during confidential guest reset.
No functional chnage.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-13-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When the kvm file descriptor changes as a part of confidential guest reset,
some architecture specific setups including SEV/SEV-SNP/TDX specific setups
needs to be redone. These changes are implemented as a part of the
kvm_arch_on_vmfd_change() callback which was introduced previously.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-11-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We will re-register smram listeners after the VM file descriptors has changed.
We need to unregister them first to make sure addresses and reference counters
work properly.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-10-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Various subsystems might need to take some steps before the KVM file descriptor
for a virtual machine is changed. So a new boolean attribute is added to the
vmfd_notifier structure which is passed to the notifier callbacks.
vmfd_notifer.pre is true for pre-notification of vmfd change and false for
post notification. Notifier callback implementations can simply check
the boolean value for (vmfd_notifer*)->pre and can take actions for pre or
post vmfd change based on the value.
Subsequent patches will add callback implementations for specific components
that need this pre-notification.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-9-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A notifier callback can be used by various subsystems to perform actions when
KVM file descriptor for a virtual machine changes as a part of confidential
guest reset process. This change adds this notifier mechanism. Subsequent
patches will add specific implementations for various notifier callbacks
corresponding to various subsystems that need to take action when KVM VM file
descriptor changed.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-8-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When the KVM VM file descriptor has changed and a new one created, the guest
state is no longer in protected state. Mark it as such.
The guest state becomes protected again when TDX and SEV-ES and SEV-SNP mark
it as such.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-7-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This change adds common kvm specific support to handle KVM VM file descriptor
change. KVM VM file descriptor can change as a part of confidential guest reset
mechanism. A new function api kvm_arch_on_vmfd_change() per
architecture platform is added in order to implement architecture specific
changes required to support it. A subsequent patch will add x86 specific
implementation for kvm_arch_on_vmfd_change() as currently only x86 supports
confidential guest reset.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-6-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After the guest KVM file descriptor has changed as a part of the process of
confidential guest reset mechanism, existing memory needs to be reattached to
the new file descriptor. This change adds a helper function ram_block_rebind()
for this purpose. The next patch will make use of this function.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-5-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When a confidential virtual machine is reset, a new guest context in the
accelerator must be generated post reset. Therefore, the old accelerator guest
file handle must be closed and a new one created. To this end, a per-accelerator
callback, "rebuild_guest" is introduced that would get called when a confidential
guest is reset. Subsequent patches will introduce specific implementation of
this callback for KVM accelerator.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-4-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As a part of the confidential guest reset process, the existing encrypted guest
state must be made mutable since it would be discarded after reset. A new
encrypted and locked guest state must be established after the reset. To this
end, a new boolean member per confidential guest support class
(eg, tdx or sev-snp) is added that will indicate whether its possible to
rebuild guest state:
bool can_rebuild_guest_state;
This is true if rebuilding guest state is possible, false otherwise.
A KVM based confidential guest reset is only possible when
the existing state is locked but its possible to rebuild guest state.
Otherwise, the guest is not resettable.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-3-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_filter_msr() does not check if an msr entry is already present in the
msr_handlers table and installs a new handler unconditionally. If the function
is called again with the same MSR, it will result in duplicate entries in the
table and multiple such calls will fill up the table needlessly. Fix that.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20260225035000.385950-2-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that all pieces are in place to spawn Nitro Enclaves using
a special purpose accelerator and machine model, document how
to use it.
Signed-off-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20260225220807.33092-12-graf@amazon.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Nitro Enclaves can only boot EIF files which are a combination of
kernel, initramfs and cmdline in a single file. When the kernel image is
not an EIF, treat it like a kernel image and assemble an EIF image on
the fly. This way, users can call QEMU with a direct
kernel/initrd/cmdline combination and everything "just works".
Signed-off-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Link: https://lore.kernel.org/r/20260225220807.33092-11-graf@amazon.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In follow-up patches we need some EIF file definitions that are
currently in the eif.c file, but want to access them from a separate
device. Move them into the header instead.
Signed-off-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Link: https://lore.kernel.org/r/20260225220807.33092-10-graf@amazon.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a machine model to spawn a Nitro Enclave. Unlike the existing -M
nitro-enclave, this machine model works exclusively with the -accel
nitro accelerator to drive real Nitro Enclave creation. It supports
memory allocation, number of CPU selection, both x86_64 as well as
aarch64, implements the Enclave heartbeat logic and debug serial
console.
To use it, create an EIF file and run
$ qemu-system-x86_64 -accel nitro,debug-mode=on -M nitro -nographic \
-kernel test.eif
or
$ qemu-system-aarch64 -accel nitro,debug-mode=on -M nitro -nographic \
-kernel test.eif
Signed-off-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20260225220807.33092-9-graf@amazon.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The VBSA test is a subset of the wider Arm architecture compliance
suites (ACS) which validate machines meet particular minimum set of
requirements. The VBSA is for virtual machines so it makes sense we
should check the -M virt machine is compliant.
Fortunately there are prebuilt binaries published via github so all we
need to do is build an EFI partition and place things in the right
place.
There are some additional Linux based tests which are left for later.
Reviewed-by: Mohamed Mediouni <mohamed@unpredictable.fr>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20260226185303.1920021-8-alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
For now only use the minimal decadency set until all the OpenBSD
mappings can be divined.
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20260226185303.1920021-7-alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
For reasons still not clear to me passing the single dashed
-interactive would confuse the argument parsing enough we tried to
pass "nterative" as a string to the launch command causing failure and
head scratching.
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20260226185303.1920021-6-alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
The bugs have evidently been fixed in the latest release so we can
migrate the laggards into how all-test-cross container and remove the
legacy hacks. They are also packaged for the main architectures so we
don't need to jump through the amd64 hoops.
Suggested-by: John Snow <jsnow@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20260226185303.1920021-3-alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Debian 11 was EOL in 2024, and Debian 12 will be EOL this June. This
patch moves all but one of our tests, debian-legacy-test-cross, onto
Debian 13.
This patch does the bare minimum to upgrade these tests and doesn't make
any attempt at optimization or cleanup that may or may not be possible
with this upgrade.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[AJB: tweak summary line]
Message-ID: <20260226185303.1920021-2-alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
The nitro accel does not actually make use of CPU emulation or details:
It always uses the host CPU regardless of configuration. Machines for
the nitro accel select the host CPU type as default to have a clear
statement of the above and to have a unified cpu type across all
supported architectures.
The arm64 logic on Linux currently only allows -cpu host for KVM based
virtual machines. Add a special case for nitro so that when the nitro
accel is active, it allows use of the host cpu type.
Signed-off-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20260225220807.33092-8-graf@amazon.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Nitro Enclaves expect the parent instance to host a vsock heartbeat listener
at port 9000. To host a Nitro Enclave with the nitro accel in QEMU, add
such a heartbeat listener as device model, so that the machine can
easily instantiate it.
Signed-off-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20260225220807.33092-7-graf@amazon.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>