[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps

On Mon, Aug 08, 2011 at 08:29:51AM -0500, Anthony Liguori wrote:
> On 08/08/2011 03:42 AM, Shribman, Aidan wrote:
> >Subject: [PATCH v4] XBZRLE delta for live migration of large memory apps
> >From: Aidan Shribman<aidan shribman sap com>
> >
> >By using XBZRLE (Xor Binary Zero Run-Length-Encoding) we can reduce VM downtime
> >and total live-migration time of VMs running memory write intensive workloads
> >typical of large enterprise applications such as SAP ERP Systems, and generally
> >speaking for any application with a sparse memory update pattern.


> One thing that strikes me about this algorithm is that it's very
> good for a particular type of workload--shockingly good really.
> I think workload aware migration compression is possible for a lot
> of different types of workloads.  That makes me a bit wary of QEMU
> growing quite a lot of compression mechanisms.
> It makes me think that this logic may really belong at a higher
> level where more information is known about the workload.  For
> instance, I can imagine XBZRLE living in something like libvirt.
> Today, parsing migration traffic is pretty horrible but I think
> we're pretty strongly committed to fixing that in 1.0.  That makes
> me wonder if it would be nicer architecturally for a higher level
> tool to own something like this.
> Originally, when I added migration, I had the view that we would
> have transport plugins based on the exec: protocol.  That hasn't
> really happened since libvirt really owns migration but I think
> having XBZRLE as a transport plugin for libvirt is something worth
> considering.

NB I've not been much of a fan of the exec: migration code, since it
has proved rather buggy in practice when we used it for 'save/restore
to/from file' support. It has been hard to diagnose when things go
wrong, and difficult for QEMU to report any useful error messages.
Even with the tcp: protocol, QEMU is seemingly unable to provide any
useful error reporting even of things as simple as "unable to connect
to remote host". So with one exception, current libvirt now uses the
'fd:' protocol for everything, and the last exception will be removed
soon too.

> I'm curious what people think about this type of approach.  CC'ing
> libvirt to get their input.

In "normal" migration though, even when using fd:, we don't make
any attempt to touch the data stream. We just pass a pre-connected
TCP socket into QEMU and let it write directly to it. This avoids
extra data copying via libvirt.

In our alternative "tunnelled" migration mode, libvirt does touch
the data stream, passing a pipe FD into QEMU, and copying the data
from the pipe into packets to be sent over libvirtd's existing
secure RPC stream, and then copying it back to QEMU on the destination.
The downside here is that we've added several extra data copies.

In our "save/restore to file" code, we use 'fd:' and always have
to send the data via a filter program. For example, we have the
ability to compress/decompress data via gzip, bzip, xz, and lzop,
for which instead pass QEMU as pipe FD to the external compression
helper program. We also have another new option where we send data
via another I/O helper program that uses O_DIRECT, so save/restore
does not pollute the page cache.

With this kind of existing precedent, I won't strongly argue against
libvirt adding a filter to support this XBZRLE encoding scheme for
migration, or indeed save/restore too, if it proves better than
lzop which is our current optimal speed/compression winner.

My main concern with all these scenarios where libvirt touches the
actual data stream though is that we're introducing extra data copies
into the migration path which potentially waste CPU cycles.
If QEMU can directly XBZRLE encode data into the FD passed via 'fd:'
then we minimize data copies. Whether this is a big enough benefit
to offset the burden of having to maintain various compression code
options in QEMU I can't answer.

|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]