Browse Source
security.texi is included from qemu-doc.texi but is not used in the qemu.1 manpage. So we can do a straightforward conversion of the contents, which go into the system manual. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20200228153619.9906-17-peter.maydell@linaro.org Message-id: 20200226113034.6741-16-pbonzini@redhat.compull/88/head
2 changed files with 174 additions and 0 deletions
@ -0,0 +1,173 @@ |
|||
Security |
|||
======== |
|||
|
|||
Overview |
|||
-------- |
|||
|
|||
This chapter explains the security requirements that QEMU is designed to meet |
|||
and principles for securely deploying QEMU. |
|||
|
|||
Security Requirements |
|||
--------------------- |
|||
|
|||
QEMU supports many different use cases, some of which have stricter security |
|||
requirements than others. The community has agreed on the overall security |
|||
requirements that users may depend on. These requirements define what is |
|||
considered supported from a security perspective. |
|||
|
|||
Virtualization Use Case |
|||
''''''''''''''''''''''' |
|||
|
|||
The virtualization use case covers cloud and virtual private server (VPS) |
|||
hosting, as well as traditional data center and desktop virtualization. These |
|||
use cases rely on hardware virtualization extensions to execute guest code |
|||
safely on the physical CPU at close-to-native speed. |
|||
|
|||
The following entities are untrusted, meaning that they may be buggy or |
|||
malicious: |
|||
|
|||
- Guest |
|||
- User-facing interfaces (e.g. VNC, SPICE, WebSocket) |
|||
- Network protocols (e.g. NBD, live migration) |
|||
- User-supplied files (e.g. disk images, kernels, device trees) |
|||
- Passthrough devices (e.g. PCI, USB) |
|||
|
|||
Bugs affecting these entities are evaluated on whether they can cause damage in |
|||
real-world use cases and treated as security bugs if this is the case. |
|||
|
|||
Non-virtualization Use Case |
|||
''''''''''''''''''''''''''' |
|||
|
|||
The non-virtualization use case covers emulation using the Tiny Code Generator |
|||
(TCG). In principle the TCG and device emulation code used in conjunction with |
|||
the non-virtualization use case should meet the same security requirements as |
|||
the virtualization use case. However, for historical reasons much of the |
|||
non-virtualization use case code was not written with these security |
|||
requirements in mind. |
|||
|
|||
Bugs affecting the non-virtualization use case are not considered security |
|||
bugs at this time. Users with non-virtualization use cases must not rely on |
|||
QEMU to provide guest isolation or any security guarantees. |
|||
|
|||
Architecture |
|||
------------ |
|||
|
|||
This section describes the design principles that ensure the security |
|||
requirements are met. |
|||
|
|||
Guest Isolation |
|||
''''''''''''''' |
|||
|
|||
Guest isolation is the confinement of guest code to the virtual machine. When |
|||
guest code gains control of execution on the host this is called escaping the |
|||
virtual machine. Isolation also includes resource limits such as throttling of |
|||
CPU, memory, disk, or network. Guests must be unable to exceed their resource |
|||
limits. |
|||
|
|||
QEMU presents an attack surface to the guest in the form of emulated devices. |
|||
The guest must not be able to gain control of QEMU. Bugs in emulated devices |
|||
could allow malicious guests to gain code execution in QEMU. At this point the |
|||
guest has escaped the virtual machine and is able to act in the context of the |
|||
QEMU process on the host. |
|||
|
|||
Guests often interact with other guests and share resources with them. A |
|||
malicious guest must not gain control of other guests or access their data. |
|||
Disk image files and network traffic must be protected from other guests unless |
|||
explicitly shared between them by the user. |
|||
|
|||
Principle of Least Privilege |
|||
'''''''''''''''''''''''''''' |
|||
|
|||
The principle of least privilege states that each component only has access to |
|||
the privileges necessary for its function. In the case of QEMU this means that |
|||
each process only has access to resources belonging to the guest. |
|||
|
|||
The QEMU process should not have access to any resources that are inaccessible |
|||
to the guest. This way the guest does not gain anything by escaping into the |
|||
QEMU process since it already has access to those same resources from within |
|||
the guest. |
|||
|
|||
Following the principle of least privilege immediately fulfills guest isolation |
|||
requirements. For example, guest A only has access to its own disk image file |
|||
``a.img`` and not guest B's disk image file ``b.img``. |
|||
|
|||
In reality certain resources are inaccessible to the guest but must be |
|||
available to QEMU to perform its function. For example, host system calls are |
|||
necessary for QEMU but are not exposed to guests. A guest that escapes into |
|||
the QEMU process can then begin invoking host system calls. |
|||
|
|||
New features must be designed to follow the principle of least privilege. |
|||
Should this not be possible for technical reasons, the security risk must be |
|||
clearly documented so users are aware of the trade-off of enabling the feature. |
|||
|
|||
Isolation mechanisms |
|||
'''''''''''''''''''' |
|||
|
|||
Several isolation mechanisms are available to realize this architecture of |
|||
guest isolation and the principle of least privilege. With the exception of |
|||
Linux seccomp, these mechanisms are all deployed by management tools that |
|||
launch QEMU, such as libvirt. They are also platform-specific so they are only |
|||
described briefly for Linux here. |
|||
|
|||
The fundamental isolation mechanism is that QEMU processes must run as |
|||
unprivileged users. Sometimes it seems more convenient to launch QEMU as |
|||
root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a |
|||
huge security risk. File descriptor passing can be used to give an otherwise |
|||
unprivileged QEMU process access to host devices without running QEMU as root. |
|||
It is also possible to launch QEMU as a non-root user and configure UNIX groups |
|||
for access to ``/dev/kvm``, ``/dev/net/tun``, and other device nodes. |
|||
Some Linux distros already ship with UNIX groups for these devices by default. |
|||
|
|||
- SELinux and AppArmor make it possible to confine processes beyond the |
|||
traditional UNIX process and file permissions model. They restrict the QEMU |
|||
process from accessing processes and files on the host system that are not |
|||
needed by QEMU. |
|||
|
|||
- Resource limits and cgroup controllers provide throughput and utilization |
|||
limits on key resources such as CPU time, memory, and I/O bandwidth. |
|||
|
|||
- Linux namespaces can be used to make process, file system, and other system |
|||
resources unavailable to QEMU. A namespaced QEMU process is restricted to only |
|||
those resources that were granted to it. |
|||
|
|||
- Linux seccomp is available via the QEMU ``--sandbox`` option. It disables |
|||
system calls that are not needed by QEMU, thereby reducing the host kernel |
|||
attack surface. |
|||
|
|||
Sensitive configurations |
|||
------------------------ |
|||
|
|||
There are aspects of QEMU that can have security implications which users & |
|||
management applications must be aware of. |
|||
|
|||
Monitor console (QMP and HMP) |
|||
''''''''''''''''''''''''''''' |
|||
|
|||
The monitor console (whether used with QMP or HMP) provides an interface |
|||
to dynamically control many aspects of QEMU's runtime operation. Many of the |
|||
commands exposed will instruct QEMU to access content on the host file system |
|||
and/or trigger spawning of external processes. |
|||
|
|||
For example, the ``migrate`` command allows for the spawning of arbitrary |
|||
processes for the purpose of tunnelling the migration data stream. The |
|||
``blockdev-add`` command instructs QEMU to open arbitrary files, exposing |
|||
their content to the guest as a virtual disk. |
|||
|
|||
Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor, |
|||
or Linux namespaces, the monitor console should be considered to have privileges |
|||
equivalent to those of the user account QEMU is running under. |
|||
|
|||
It is further important to consider the security of the character device backend |
|||
over which the monitor console is exposed. It needs to have protection against |
|||
malicious third parties which might try to make unauthorized connections, or |
|||
perform man-in-the-middle attacks. Many of the character device backends do not |
|||
satisfy this requirement and so must not be used for the monitor console. |
|||
|
|||
The general recommendation is that the monitor console should be exposed over |
|||
a UNIX domain socket backend to the local host only. Use of the TCP based |
|||
character device backend is inappropriate unless configured to use both TLS |
|||
encryption and authorization control policy on client connections. |
|||
|
|||
In summary, the monitor console is considered a privileged control interface to |
|||
QEMU and as such should only be made accessible to a trusted management |
|||
application or user. |
|||
Loading…
Reference in new issue