[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration

On 09/07/2010 10:05 AM, Stefan Hajnoczi wrote:
On Tue, Sep 7, 2010 at 3:57 PM, Anthony Liguori
<aliguori linux vnet ibm com>  wrote:
On 09/07/2010 09:49 AM, Stefan Hajnoczi wrote:
On Tue, Sep 7, 2010 at 3:34 PM, Kevin Wolf<kwolf redhat com>    wrote:

Am 07.09.2010 15:41, schrieb Anthony Liguori:


We've got copy-on-read and image streaming working in QED and before
going much further, I wanted to bounce some interfaces off of the
libvirt folks to make sure our final interface makes sense.

Here's the basic idea:

Today, you can create images based on base images that are copy on
write.  With QED, we also support copy on read which forces a copy from
the backing image on read requests and write requests.

In additional to copy on read, we introduce a notion of streaming a
block device which means that we search for an unallocated region of the
leaf image and force a copy-on-read operation.

The combination of copy-on-read and streaming means that you can start a
guest based on slow storage (like over the network) and bring in blocks
on demand while also having a deterministic mechanism to complete the

The interface for copy-on-read is just an option within qemu-img

Shouldn't it be a runtime option? You can use the very same image with
copy-on-read or copy-on-write and it will behave the same (execpt for
performance), so it's not an inherent feature of the image file.

Doing it this way has the additional advantage that you need no image
format support for this, so we could implement copy-on-read for other
formats, too.

I agree that streaming should be generic, like block migration.  The
trivial generic implementation is:

void bdrv_stream(BlockDriverState* bs)
     for (sector = 0; sector<    bdrv_getlength(bs); sector += n) {
         if (!bdrv_is_allocated(bs, sector,&n)) {

Three problems here.  First problem is that bdrv_is_allocated is
synchronous.  The second problem is that streaming makes the most sense when
it's the smallest useful piece of work whereas bdrv_is_allocated() may
return a very large range.

You could cap it here but you then need to make sure that cap is at least
cluster_size to avoid a lot of unnecessary I/O.

The QED streaming implementation is 140 LOCs too so you quickly end up
adding more code to the block formats to support these new interfaces than
it takes to just implement it in the block format.

Third problem is that  streaming really requires being able to do zero write
detection in a meaningful way.  You don't want to always do zero write
detection so you need another interface to mark a specific write as a write
that should be checked for zeros.
Good points.  I agree that it is easiest to write features into the
block driver, but there is a significant amount of code duplication,

There's two ways to attack code duplication. The first is to move the feature into block.c and add interfaces to the block drivers to support it. The second is to keep it in qed.c but to abstract out things that could really be common to multiple drivers (like the find_cluster functionality and some of the request handling functionality).

I prefer the later approach because it keeps a high quality implementation of copy-on-read whereas the former is almost certainly going to dumb down the implementation.

plus the barrier for enabling other block drivers with these features
is increased.  These points (except the lines of code argument) can be
addressed with the proper extensions to the block driver interface.


Anthony Liguori


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]