[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[libvirt] RFC: disconnecting guest/domain interface config from host config (aka migration with macvtap)



Abstraction of guest <--> host network connection in libvirt
=====================================

The <interface> element of a guest's domain config in libvirt has a <source> element that describes what resources on a host will be used to connect the guest's network interface to the rest of the world. This is very flexible, allowing several different types of connection (virtual network, host bridge, direct macvtap connection to physical interface, qemu usermode, user-defined via an external script), but currently has the problem that unnecessary details of the host config are embedded into the guest's config; if the guest is migrated to a different host, and that host has a different hardware or network config (or possibly the same hardware, but that hardware is currently in use by a different guest), the migration will fail.

I am proposing a change to libvirt's network XML that will allow us to (optionally - old configs will remain valid) remove the host details from the guest's domain XML (which can move around from host to host) and place them in the network XML (which remains with the host); the domain XML will then use existing config elements to associate each guest interface with a "network".

The motivating use case for this change is the "direct" connection type (which uses macvtap for vepa and vnlink connections directly between a guest and a physical interface, rather than through a bridge), but it is applicable for all types of connection. (Another hopeful side effect of this change will be to make libvirt's network connection model easier to realize on non-Linux hypervisors (eg, VMWare ESX), so Mathias - please chime in!)

Background
--------------------

libvirt currently has 3 major types of guest interface connection (there are also "type='user'" and "type='ethernet'", but they probably wouldn't be used in a multi-host environment, so I'm not considering them here):

1) type='network'

The guest's network interface is connected to a libvirt-created "virtual network", which is in reality (in the case of KVM or Xen) a Linux bridge device that isn't connected to any physical host interface - any connection to the outside goes through the host's IP routing stack.

The network to use is indicated in the <source> element of the guest's interface xml: <source network='mynetwork'/>. Because the name 'mynetwork' is controlled by libvirt, it's perfectly reasonable to assume that the same network name could be available on another host that is accepting a migrated guest.

2) type='bridge'

The guest's network interface is connected to a bridge device (eg "br0") that has already been configured in the host's network config files (eg, in /etc/sysconfig/network-scripts). This bridge is itself connected to the outside via a physical host interface, eg "eth0", *NOT* through the hosts IP routing stack.

The bridge to use is indicated in the source with <source bridge='br0'/>. Although the naming of the bridge is outside the scope of libvirt, it is at least possible to setup all hosts to have the same bridge name (so that a guest could be migrated from one host to another).

3) type='direct'

The guest's network interface is connected directly to a physical interface (eg "eth0") with macvtap, or sometimes to a virtual function ("VF") of a physical interface (which is also really just another interface, from the software point of view).

The interface to use is indicated with <source interface='eth0' mode='something'/> In this case, the interface name is determined by the host OS and cannot be arbitrarily changed. Also a host will have multiple interfaces / VFs available to guests, and in some modes may allow only a single guest to connect to a given interface (implying that the interface used by a guest when on one host will probably not be available when migrating to another). So in order to have flexible migration from one host to another, an abstraction to allow the guest XML to use the same name on all hosts must be introduced.

Three possible methods for providing this abstraction come to mind:

Option 1
-----------

(Be forewarned that Option 1 & 2 are shown here mainly to illustrate my thought process while arriving at my preferred Option - 3 :-)

In a manner similar to the way the vnet%d tap devices are created, name the interface with an embedded variable (eg "eth%d") (plus attributes for min and max %d) and let the underlying code in libvirt search for/reserve an appropriate device>

This is the simplest to code/configure, but does not allow a) more complex names (eg, interface names as determined by biosdevname can be of the form "pci%dp%d_%d"), b) multiple ranges, c) oversubscribing of interfaces (it is possible, although sub-optimal, to connect multiple guest interfaces to a single host interface with macvtap).

VERDICT: looks ugly, not flexible enough.

Option 2
-----------

create a new class of libvirt XML config to describe a pool of network interfaces, and reference this pool in the guest interface element:

<interface type='interfacePool'>
<source pool='red-network'/>
         ...
</interface>

The problem with this is that it requires a new API for defining/undefining/etc management of "interface pools". Also, it wouldn't allow (for example) one host to use a pool of macvtap addresses to connect guests, and another host to use a host bridge for the same connection (obviously, such a non-uniform setup wouldn't be desirable in a large host farm, but may be encountered in some smaller setup)

VERDICT: creates more API clutter (ie extra work *and* confusion for users). Is "flexible enough" for current motivation, but unnecessarily limiting, eg doesn't help the model to be more easily adapted to VMWare etc.

Option 3
-----------

Up to now we've only discussed the need for separating the host-specific config (<source> element) in the case of type='direct' interfaces (well, in reality I've gone back and edited this document so many times that is no longer true, but play along with me! :-). But it really is a problem for all interface types - all of the information currently in the guest's interface <source> element really is tied to the host, and shouldn't be defined in detail in the guest XML; it should instead be defined once for each host, and only referenced by some name in the guest XML; that way as a guest moves from host to host, it will automatically adjust its connection to match the new environmant.

As a more general solution, instead of having the special new "interfacePool" object in the config, what if the XML for "network was expanded to mean "any type of guest network connection" (with a new "type='xxx'" attribute at the toplevel to indicate which type), not just "a private bridge optionally connected to the real world via routing/NAT"?

If this was the case, the guest interface XML could always be, eg:

<interface type='network'>
<source network='red-network'/>
          ...
</interface>

and depending on the network config of the host the guest was migrated to, this could be either a direct (macvtap) connection via an interface allocated from a pool (the pool being defined in the definition of 'red-network'), a bridge (again, pointed to by the definition of 'red-network', or a virtual network (using the current network definition syntax). This way the same guest could be migrated not only between macvtap-enabled hosts, but from there to a host using a bridge, or maybe a host in a remote location that used a virtual network with a secure tunnel to connect back to the rest of the red-network. (Part of the migration process would of course check that the destination host had a network of the proper name, and fail if it didn't; management software at a level above libvirt would probably filter a list of candidate migration destinations based on available networks, and only attempt migration to one that had the matching network available).


Examples of 'red-network' for different types of connections (all of these would work with the interface XML given above):

<!-- Existing usage - a libvirt virtual network -->
<network> <!-- (you could put "type='virtual'" here for symmetry) -->
<name>red-network</name>
<bridge name='virbr0'/>
<forward mode='route'/>
      ...
</network>

<!-- The simplest - an existing host bridge -->
<network type='bridge'>
<name>red-network</name>
<bridge name='br0'/>
</network>

<network type='direct'>
<name>red-network</name>
<source mode='vepa'>
<!-- define the pool of available interfaces here. Interfaces may have -->
<!-- parameters associated with them, eg max number of simultaneous guests -->
</source>
<!-- add any other elements from the guest interface XML that are tied to --> <!-- the host here (virtualport ?) (of course if they're host specific, they -->
<!-- should have been in <source> in the first place!!) -->
</network>

I know there may be some resistance to this expansion of the usage of <network>, but I think it does fit in with the current usage properly, and is preferable to adding an entire new class of API just to define a pool of interfaces.

Open questions:

1) What should the <pool> element inside network/source look like. Making each interface in the pool a separate element, with possible attributed, would be the simplest to code, but would get tedious on a system with, for example, an ethernet card with 64 VFs. On the other hand, just parameterizing a string (eth%d) is inadequate, eg, when there are multiple non-contiguous ranges.

2) do we need a "max connections" for each interface in a pool of macvtap interfaces? Or should we just overload them in a round-robin fashion unless mode='passthru' (a new mode which requires only one guest per interface).

3) What about the parameters in the <virtualport> element that are currently used by vepa/vnlink. Do those belong with the host, or with the guest?

4) Are there other <network> types that we want? Perhaps the recent proposal for IPSec / secure tunnels could be incorporated as a new network type (or maybe it could just be the standard "virtual" type, with a tunnel as the forward device).


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]