[Date Prev][Date Next] [Thread Prev][Thread Next]
Re: [libvirt] RFC: extending sVirt to confine host apps which talk to libvirtd
- From: Daniel J Walsh <dwalsh redhat com>
- To: "Daniel P. Berrange" <berrange redhat com>
- Cc: libvir-list redhat com, James Morris <jmorris namei org>, Eric Paris <eparis redhat com>
- Subject: Re: [libvirt] RFC: extending sVirt to confine host apps which talk to libvirtd
- Date: Mon, 06 Jun 2011 14:51:15 -0400
-----BEGIN PGP SIGNED MESSAGE-----
On 06/06/2011 10:41 AM, Daniel P. Berrange wrote:
> What follows is a document outlining some thoughts I've been having
> on extending sVirt to allow confinement of applications which talk
> to libvirtd on the host, primarily focusing on use of SELinux, but
> also allowing a simple non-SElinux RBAC mechanism.
> Securing KVM virtualization hosts with MAC
> This document looks at the task of securing KVM virtualizaton
> hosts using mandatory access control technologies, with focus
> on SELinux. At the time of writing there have been two phases
> of development, and this document makes proposals for a third
> Phase 1: circa 2006
> Goal: Protect the host from a compromised virtual machine.
> The first phase of development had the modest goal of
> protecting the host from attack by a compromised virtual
> machine. To achieve this, the KVM processes are configured
> such that they will run under a confined security context
> ('virt_t' in the SELinux reference policy), which blocks
> access to any host resources not labelled ('virt_image_t')
> for use by virtual machines.
> The primary limitations of this initial implementation
> is that while the virtual host is secured, there is no
> protection between virtual machines. This can be considered
> a regression in isolation as compared to that offered by
> non-virtualized hosts. The second limitation is that the
> virtualization admin has to take care to ensure the host
> resources intended for use by the virtual machines are
> correctly labelled. This is a manual setup taks unless
> the images are kept in a preset location (/var/lib/libvirt/images
> in the SELinux reference policy).
> Phase 2: March 2009
> Goal: Protect virtual machines from each other
> The second phase of development has the goal of providing
> isolation between virtual machines that is comparable to
> that achieved between physical machines. This piece of
> work is commonly referred to as "svirt". The achieve this,
> the KVM processes are each configured to run under a
> dedicated security context, which blocks access to any
> resources not explicitly assigned to that virtual machine.
> In the SELinux implementation, the base context "svirt_t"
> has a unique MCS category ("c240,c955") appended to form
> a unique security context "system_u:system_r:svirt_t:s0:c240,c955".
> For each host resource to be assigned to the virtual machine,
> the base context "svirt_image_t" is combined with the same
> MCS category to form a unique resource security context
> The assignment of virtual machine security contexts and
> labelling of resources can be done statically by the
> administrator / management application, or dynamically
> by the libvirtd daemon. The latter removes much of the
> administrator burden.
> The second phase has addressed the major guest security
> limitation of the first phase, and eased the burden placed
> on host administors. Attention can now focus on the security
> of the host management software stack. Client applications
> communicate with the libvirtd daemon using a simple sockets
> based RPC protocol. Thus operations initiated by client
> applications which run under one security context are in
> fact invoked under the libvirtd daemon's security context.
> Since the libvirtd daemon is a highly privileged, almost
> unconfined process, this provides a means for applications
> to elevate their privileges.
> A second problem with the current model is seen when looking
> at guest migration between hosts. During migration, there
> are two QEMU processes running for the same virtual machine,
> one process on each host. The dynamic assignment of MCS
> values to form unique security contexts is done on a per host
> basis, so there is no guarantee that the VM on host A will be
> using (or be able to use) the same security context on the
> target host of migration. This is not neccessarily a problem
> if the guest is using block devices, since block device inode
> labels are only visible to a single host. With a shared
> filesystem that supports SELinux labelling, like GFS2, both
> QEMU processes must run in the same security context to allow
> them both to access the associated files.
> Phase 3: June 2011
> Goal: Protect virtual machines from host applications
> The third phase of development has the primary goal of
> honouring the confinement of client applications talking
> to libvirtd, when performing operations on virtual machines
> and other managed objects (storage pools, host devices,
> virtual networks, secrets, etc). Every application connecting
> to libvirt has an associated security context. Every object
> managed by libvirtd will have an associated security context.
> When an operation is invoked via a libvirt API the client
> application security context will be checked against the
> target object context, before proceeding. Thus applications
> will not be able to make use of a libvirtd connection to
> perform operations that are otherwise blocked.
> The secondary goal is to add further flexibility and safety
> to the way MCS categories are assigned, and files are relabelled.
> Instead of maintaining a local database of assigned labels, there
> must be some shared storage where label usage can be recorded.
> At its simplest this can be an NFS share, with one file per MCS
> category and locking with fcntl(). An alternative would to be
> acquire leases using a lock manager such as sanlock. In addition,
> the guest configuration will be enhanced such that a guest can
> be assigned a statically chosen security context, but still make
> use of dynamic relabelling of resources. Finally the existing
> boolean mode of 'static' vs 'dynmamic' label generation will be
> turned into a tri-state, introducing a 'hybrid' mode where the
> client supplies a custom base context, and the MCS part is still
> Usage scenarios
> To aid in development a couple of relevant core use cases
> or usage scenarios have been identified:
> 1. A virtual machine monitoring application
> For this example, consider the simple monitoring application
> 'virt-top'. This application displays a list of all virtual
> machines on the host and their associated resource utilization
> (CPU, disk, network). This application has no need to be able
> to stop/start/define virtual machines, nor do any operation
> related to host devices, storage, or networking. Traditionally
> this application is written to use a read only libvirt connection.
> With enhanced access control from libvirtd, the policy would define
> a new security context 'virt_top_t' for the 'virt-top' application.
> This policy would allow 'list', 'read', 'readstats' on the 'domain'
> object type.
> 2. A multi-guest, multi-user MLS enabled host
> For this example, consider a virtualizaton host with MLS policy
> that is running multiple virtual machines, for a variety of
> different users. A user with the security level "restricted"
> must not be allowed to control virtual machines with a security
> level of "confidential". Conversely a user with security level
> "secret" must not be allowed to create virtual machines with a
> security level of "unclassified".
> With enhanced access control from libvirtd, getpeercon() would
> provide the security context of the client application (user).
> The client context would be used to perform an AVC when any API
> operation is invoked, thus ensuring that the client's MLS
> label is honoured in access control checks. The effect would be
> that when an 'restricted' user asked for a list of virtual machines
> only virtual machines at level 'restricted' or below would be
> returned. Or when a "secret" user asked to start a guest when
> a security level of 'unclassified', the operation would be denied.
> 3. Identity transitions from trusted agents
> For this example, consider a trusted agent such as libvirt-qpid,
> or libvirt-snmp, which translates the libvirt API from its native
> model, into an alternate access model. In such an example, the
> agent talking to libvirtd will have authenticated itself. The
> peer identity that libvirtd sees, however, is that of the agent,
> not the ultimate (end-user) client. In such a case it will desirable
> to allow a trusted agent to transition to a different identity when
> performing operations.
> An end user running under context "unconfined_u:unconfined_r:virt_top_t:s0-s0:c0.c1023"
> may talk to the libvirt-qpid agent which runs under the context
> "system_u:system_r:virt_qpid_t:s0-s0:c0.c1023". The libvirt-qpid
> connects to libvirtd which sees 'virt_qpid_t' as the client type.
> The policy is written to allow transitions from 'virt_qpid_t' to
> the 'virt_top_t' type, so when the virt-top client connects to
> libvirt-qpid, it changes its identity to 'virt_top_t'. From that
> point onwards, all AVC checks honour the privileges of the ultimate
> end user application, rather than the libvirt-qpid intermediary.
> The same mechanism also ensures that the client application MLS
> level is transferred via the libvirt-qpid agent to libvirtd.
> Anticipated Development tasks
> 1. Extend the domain XML to add a third attribute to the <seclabel>
> element relabel="yes|no", to control whether libvirtd will
> automatically label resources assigned to a guest. If the
> existing 'mode' attribute is "dynamic", then relabelling will
> default to enabled, while if it is 'static', then relabelling
> will default to disabled. Also change 'mode' to allow a new
> 'hybrid' value.
> 2. Determine how to maintain/identify security labels for other
> managed objects, including virStoragePoolPtr, virStorageVolPtr,
> virSecretPtr, virNetworkPtr, virInterfacePtr, virNodeDevicePtr,
> an host level APIs without any explicit managed object.
> 3. Extend XML for non-domain objects to implant security labels
> as identified in step 2.
> 4. Create an internal virIdentity struct to store the identity
> of the client. This will include at least the x509 distinguished
> name, the SASL username, the SELinux context (getpeercon())
> and UNIX username/group (SCM_CREDENTIALS).
> 5. Create a new public API to allow a client application to
> supply a new identity, allowing them to pass a new x509
> distinguished name, SASL username, SELinux context and
> UNIX username/group.
> 6. Extend the libvirtd daemon such that the current identity
> is stored in a thread local whenever invoking a public
> API operation.
> 7. Extend the QEMU driver such that a suitable identity is
> set when performing autonomous background operations
> such as domain auto-start and core dump, in a non-API
> 8. Create a set of internal access control helper APIs in
> $libvirt/src/accesscontrol/. There will be one API for each
> managed object, talking an object pointer, and an operation
> identifier (from an enum).
> 9. Create a simple impl of the access control APIs which defines
> roles for groups of user identities, and grants privileges to
> each role based on the operation names. This allows for simple
> testing of internal infrastructure, and an RBAC mechanism for
> users who lack SELinux in their OS.
> 10. Implant access control checks into the main codepaths of every
> driver method implementations in the QEMU driver.
> 11. Change the SELinux reference policy to define the new security
> types and access vectors for the libvirt objects & associated
> API calls.
> 12. Create a SELinux impl of the access control APIs which invokes
> avc_has_perm() using the client's SELinux context. This is
> intended to be the primary RBAC mechanism for Fedora/RHEL
> virtualization hosts.
> 13. Write policy to confine targetted applications like virt-top,
> 14. Extend libvirt-snmp, libvirt-cim, libvirt-qpid to pass through
> the client identity to libvirtd.
> Technical Notes / Issues
> 1. Adding new SELinux security classes / access vectors
> The selinux security classes are defined in /usr/include/selinux/flask.h
> and access vectors in /usr/include/selinux/av_permissions.h Both of these
> files are automatically by a script in the selinux reference policy code
> '$serefpolicy/policy/flask/flask.py'. The master data files are in the
> same directory, 'access_vectors' and 'security_classes'. Once generated,
> the headers need to be manually copied into the libselinux package
You do not need to do this anymore. libselinux does not care about the
access vectors, they are named in your application.Well
> APIs are added to libvirt on a very frequent basis. What is the process
> for applying access control to them if the SELinux policy does not yet
> have a suitable access vector / security class defined ? Do we need a
> generic 'admin' access vector we can use as catch all, until more
> specific vectors can be defined for the new APIs. Desirable to avoid
> having to lock-step upgrade libvirt with selinux policy for all additions
> to the libvirt public API.
Well one benefit would be unconfined_t, although I am not sure it would
> 2. Security contexts for libvirt managed objects
> virDomainPtr: Already embedded in XML, unless using dynamic labelling
> in which case context is assigned at startup.
> virNetworkPtr: No existing security context, nor any object on disk
> that could be used. Follow example of domains and embed
> <seclabel> in the XML. Assign unique MCS category per
> network and ensure that daemons launched per network
> (dnsmasq, radvd) inherit the MCS category.
> virSecretPtr: No existing security context. Secrets may be associated
> with disk paths for VMs. Could copy the security context
> of the guests and apply it to the secret, or have a
> dedicated type svirt_secret_t and just copy the MCS
> category. Hard to make it work for guests with dynamic
> MCS assignment.
> virStoragePoolPtr: No existing security context. Some pool types have
> objects existing on the host filesystem eg SCSI
> HBAs have a directory in sysfs, filesystem dirs
> have a directory somewhere, LVM has directory
> for the volume group in /dev. Other pool types have
> no object on disk anywhere convenient. eg Sheepdog.
> Other pool types only have an object on disk when
> the pool is active (eg iSCSI, NFS). So there is
> nothing to use for API checks when the pool is
> Likely have to ignore whatever associated resource
> is on disk and just store a security context in the
> XML config as with virDomainPtr/virNetworkPtr.
> virStorageVolPtr: Currently reports the SELinux security label associated
> with the file on disk. Not all pool types neccessarily
> have volumes with a corresponding file on disks (eg
> virNodeDevicePtr: No existing security context. Most data comes from udev
> or HAL databases, though ultimately much is available
> in sysfs.
> When detaching PCI devices from host drivers, files
> in sysfs are used. When creating/deleting NPIV adapters
> sysfs is used. Thus could use sysfs file labels for AVC
> checks ?
> virConnectPtr: All host level APIs for which there is no other object
> aside from the nebulous concept of the 'host'. APIs are
> all readonly, eg query host capabilities, query free
> memory, CPU stats, etc. What if we gain APIs to make
> write calls.
> virInterfacePtr: No existing security context. Currently using netcf to
> get data from /etc/sysconfig/network-scripts/ifcfg-XXX
> files, but can't assume those file names since that is
> Fedora/RHEL specific. Might not even use netcf if it
> talks directly to network manager. Does netcf need to
> expose a security label based on the ifcfg-XXX file ?
> 3. Security labelling config modes
> When creating a guest the following XML snippets can be used.
> a. Default type, dynamic MCS, automatic relabelling
> <seclabel type='selinux' mode='dynamic' relabel='yes'/>
> b. Custom type, dynamic MCS, automatic relabelling
> <seclabel type='selinux' mode='hybrid' relabel='yes'>
Yes this would be cool, although I am not sure you need an image label,
since the MCS separation would still work on svirt_image_t. Would make
policy writing easier and selection easier if you did not change the
type of the image file.
I would at least allow for the admin to not specify a image label.
> c. Default type, dynamic MCS, no relabelling
> <seclabel type='selinux' mode='dynamic' relabel='no'/>
> Does this mode make any sense, since admin doesn't know
> MCS category upfront ? Possibly only useful if the guest
> only has readonly disks.
But you don't relabel on readonly correct, since this is a shared
resource. I would say this would not be used.
> d. Custom type, dynamic MCS, no relabelling
> <seclabel type='selinux' mode='hybrid' relabel='no'>
> Same question about whether it makes sense
I don't think this makes sense.
> e. Custom type, static MCS, auto relabelling
> <seclabel type='selinux' mode='static' relabel='yes'>
This is fine, not sure it is legal in MLS world. Although I guess we
could change the label to SystemHigh when not in use.
> f. Custom type, static MCS, no relabelling
> <seclabel type='selinux' mode='static' relabel='no'>
We have this now, this is static labeling.
> 4. Time at which to apply checks / source context
> It would be desirable to restrict the ability to use automatic file
> relabelling within the policy. If a client application defines a
> guest with the 'relabel=yes' attribute set, at what time should this
> usage be validated ?
> Validate at the time the guest is defined ? This ensures the app
> defining the guest is suitably privileged, but the file labels
> might be changed by the time the guest starts.
> Validate at the time the guest is started ? This minimises the
> window between access check being performed, and libvirtd actually
> performing the relabel operation. The app starting the guest might
> be different from the one defining the guest though ?
> Check at both define + start time ?
Probably most sane.
> What source security context should we use when performing autostart
> of virtual machines ? Normally when starting a VM, the check would be
> performed using the context of the client invoking the start API, but
> there is no such client when autostart occurs.
> Should we instead perform a 'start' operation check whenever the
> 'autostart' flag is turned on by a client ? Or check the autostart
> operation against some generic source context ?
I think we leave this in the default_context file.
One last thing to think about is since libvirt can now be run under the
users context, in certain situations, libvirt should examine the range
of MLS/MCS labels associated with it and make sure that it can only
assign MCS labels within this range.
For example if I am a user running as
libvirt should only pick random labels between 0-500.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
-----END PGP SIGNATURE-----
[Date Prev][Date Next] [Thread Prev][Thread Next]