[libvirt] migration of vnlink VMs

Tue May 24 14:21:05 UTC 2011

On Fri, Apr 29, 2011 at 04:12:55PM -0400, Laine Stump wrote:
> Okay, here's a brief description of what I *think* will work. I'll
> build up the RNG based on this pseudo-xml:
> 
> 
> For the <interface> definition in the guest XML, the main change
> will be that <source .. mode='something'> will be valid (but
> optional) when interface type='network' - in this case, it will just
> be used to match against the source mode of the network on the host.
> <virtualport> will also become valid for type='network', and will
> serve two purposes:
> 
> 1) if there is a mismatch with the virtualport on the host network,
> the migrate/start will fail.
> 2) It will be ORed with <virtualport> on the host network to arrive
> at the virtualport settings actually used.
> 
> For example:
> 
> <interface type='network'>
> <source network='red-network' mode='vepa'/>

IMHO having a 'mode' here is throwing away the main reason for
using type=network in the first place - namely independance
from this host config element.

> <virtualport type='802.1Qbg'>
> <parameters instanceid='09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f'/>
> </virtualport>
> <mac address='xx:xx:.....'/>
> </interface>
> 
> (NB: if "mode" isn't specified, and the host network is actually a
> bridge or virtual network, the contents of virtualport will be
> ignored.)
> 
> 
> <network> will be expanded by giving it an optional "type" attribute
> (which will default to 'virtual'), <source> subelement, and
> <virtualport> subelement. When type='bridge', you can specify source
> exactly as you would in a domain <interface> definition:
> 
> <network type='bridge'>
> <name>red-network</name>
> <source bridge='br0'/>
> </network>
> 
> When type='direct', again you can specify source and virtualport
> pretty much as you would in an interface definition:
> 
> <network type='direct'>
> <name>red-network</name>
> <source dev='eth0' mode='vepa'/>
> <virtualport type='802.1Qbg'>
> <parameters managerid="11" typeid="1193047" typeidversion="2"
>        instanceid='09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f'/>
> </virtualport>
> </network>

None of this really feels right to me. With this proposed
schema, there is basically nothing in common between the
existing functionality for <network> and this new functionality
except for the <name> and <uuid> elements.

Apps which know how to deal with existing <network> schema
will have no ability to interpret this new data at all.
Quite probably they will mis-interpet it as providing an
isolated virtual network, with no IP addr set, since this
design isn't actually changing any attribute value that
they currently look for.

Either we need to make this align with the existing schema,
or we need to put this under a completely separate set of
APIs. I think we can likely do better with the schema design
and achieve the former.

> However, dev would be optional - if not specified, we would expect a
> pool of interfaces to be defined within source, eg:
> 
> <network type='direct'>
> <name>red-network</name>
> <source mode='vepa'>
> <pool>
> <interface name='eth10' maxConnect='1'/>
> <interface name='eth11' maxConnect='1'/>
> <interface name='eth12' maxConnect='1'/>
> <interface name='eth13' maxConnect='1'/>
> <interface name='eth14' maxConnect='1'/>
> <interface name='eth25' maxConnect='5'/>
> </pool>
> </source>
> <virtualport ...... />
> </network>

I don't really like the fact that this design has special
cased the num(intefaces) == 1 case to have a completely
different XML schema. eg we have this:

  <source dev='eth0' mode='vepa'/>

And this

  <source mode='vepa'>
  <pool>
  <interface name='eth10' maxConnect='1'/>
  </pool>

both meaning the same thing. There should only be one
representation in the schema for this kind of thing.

> BTW, for all the people asking about sectunnel, openvswitch, and vde
> - can you see how those would fit in with this? In particular, do
> you see any conflicts? (It's easy to add more stuff on later if
> something is just missing, but much more problematic if I put
> something in that is just plain wrong).

As mentioned above, I think this design is wrong, because it is not
taking any account of the current schema for <network> which defines
the various routed modes.

Currently <network> supports 3 connectivity modes

 - Non-routed network, separate subnet        (no <forward> element present)
 - Routed network, separate subnet with NAT   (<forward mode='nat'/>)
 - Routed network, separate subnet            (<forward mode='route'/>)

Following on from this, I can see another couple of routed modes

 - Routed network, IP subnetting
 - Routed network, separate subnet with VPN

And the core goal here is to replae type=bridge and type=direct from the
domain XML, which means we're adding several bridging modes

 - Bridged network, eth + bridge + tap        (akin to type=bridge)
 - Bridged network, eth + macvtap             (akin to type=direct)
 - Bridged network, sriov eth + bridge + tap  (akin to type=bridge)
 - Bridged network, sriov eth + macvtap       (akin to type=direct)

The macvtap can be in 4 modes, so perhaps it is probably better to
consider them separately

 - Bridged network, eth + bridge + tap
 - Bridged network, eth + macvtap + vepa
 - Bridged network, eth + macvtap + private
 - Bridged network, eth + macvtap + passthrough
 - Bridged network, eth + macvtap + bridge
 - Bridged network, sriov eth + bridge + tap
 - Bridged network, sriov eth + macvtap + vepa
 - Bridged network, sriov eth + macvtap + private
 - Bridged network, sriov eth + macvtap + passthrough
 - Bridged network, sriov eth + macvtap + bridge

I can also perhaps imagine another VPN mode:

 - Bridged network, with VPN

The current routed modes can route to anywhere, or be restricted to
a particular network interface eg with <forward dev='eth0'/>. It
only allows for a single interface, though even for routed modes it
could be desirable to list multiple devs.

The other big distinction is that the <network> modes which do routing,
include interface configuration data (ie the IP addrs & bridge name)
which is configured on the fly. It looks like with the bridged modes,
you're assuming the app has statically configured the interfaces via
the virInterface APIs already, and this just points to an existing
configured interface. This isn't neccessarily a bad thing, just an
observation of a significant difference.

So if we ignore the <ip> and <domain> elements from the current <network>
schema, then there are a handful of others which we need to have a plan
for

  <forward mode='nat|route'/>   (omitted completely for isolated networks)
  <bridge name="virbr0" />      (auto-generated/filled if omitted)
  <mac address='....'/>         (auto-generated/filled if omitted)

The <forward> element can have an optional dev= attribute.

I think the key attribute is the <forward> mode= attribute. I think we
should be adding further values to that attribute for the new network
modes we want to support. We should also make use of the dev= attribute
on <forward> where practical, and/or extend it.

We could expand the list of <foward> mode values in a flat list

  - route
  - nat
  - bridge (brctl)
  - vepa
  - private
  - passthru
  - bridge (macvtap)

NB: really need to avoid using 'bridge' in terminology, since all
5 of the last options are really 'bridge'.

Or we could introduce a extra attribute, and have a 2 level list

  - <forward layer='link'/>   (for all ethernet layer bridging)
  - <forward layer='network'/> (for all IP layer bridging aka routing)

So the current modes would be

   <forward layer='network' mode='route|nat'/>

And new bridging modes would be

   <forward layer='link' mode='bridge-brctl|vepa|private|passthru|bridge-macvtap'/>

For the brctl/macvtap modes, the dev= attribute on <forward> could point to
the NIC being used, while with brctl modes, <bridge> would also be present.

In the SRIOV case, we potentiallly need a list of interfaces. For this we
probably want to use

   <forward dev='eth0'>
     <interface dev='eth0'/>
     <interface dev='eth1'/>
     <interface dev='eth2'/>
     ...
   </forward>

NB, the first interface is always to be listed both as a dev= attribute
(for compat with existing apps) *and* as a child <interface> element (for
apps knowing the new schema).

The maxConnect= attribute from your examples above is an interesting
thing. I'm not sure whether that is neccessarily a good idea. It feels
similar to VMWare's  "port group" idea, but I don't think having a
simple 'maxConnect=' attribute is sufficient to let us represent the
vmware port group idea. I think we might need an more explicit
element eg

   <portgroup count='5'>
      <interface dev='eth2'/>
   </portgroup>

eg, so this associates a port group which allows 5 clients (VM NICs)
with the uplink provided by eth2 (which is assumed to be listed
under <forward>).

So a complete SRIOV example might be

  <network>
    <name>Foo</name>
    <forward dev='eth0' layer='link' mode='vepa'>
      <interface dev='eth0'/>
      <interface dev='eth1'/>
      <interface dev='eth2'/>
      ...
    </forward>
    <portgroup count='10'>
      <interface dev='eth0'/>
    </portgroup>
    <portgroup count='5'>
      <interface dev='eth1'/>
    </portgroup>
    <portgroup count='5'>
      <interface dev='eth2'/>
    </portgroup>
  </network>

The <virtualport> parameters for VEPA/VNLink could either be stored at
the top level under <network>, or inside <portgroup> or both.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|