[Libvir] PATCH: 0/7 Implementation of storage APIs

Mon Oct 29 03:53:44 UTC 2007

Since the previous discussions didn't really end up anywhere conclusive I decided
it would be better to have a crack at getting some working code to illustrate my
ideas. Thus,  the following series of 7 patches provide an implementation of storage 
APIs which follow the scheme outlined in the previous mail on storage concepts

   http://www.redhat.com/archives/libvir-list/2007-October/msg00195.html

I have only addressed storage pools & volumes. I am not tackling the host device
enumeration APIs at this time, since it is not a blocker for the rest of the work.
The mails which follow, in particular the last one, will explain the patches in
more details. The focus is primarily on providing the public API, and wiring up
the remote driver, daemon and protocol, and the generic storage driver. The generic
storage driver then calls out to specific backends for different types of storage.
implementation provides a fully functional backend for managing local directories 
containing raw files (aka equivalent of /var/lib/xen/images). There are empty stubs
for LVM, iSCSI, Disks/Partitions, etc to be filled in as we decide...

Some open questions

 - Is it worth bothering with a UUID for individual storage volumes. It is possible
   to construct a globally unique identifier for most volumes, by combing various
   bits of metadata we have such as device unique ID, inode, iSCSI target & LUN, etc
   There isn't really any UUID that fits into the classic libvirt 16 byte UUID.
   I've implemented (randomly generated) UUIDs for the virStorageVolPtr object, but 
   I'm inclined to remove them, since its not much use if they change each time the
   libvirtd daemon is restarted. 

   The 'name' field provides a unique identifier scoped to the storage pool. I think
   we could add a 'char *key' field, as an arbitrary opaque string, forming a
   globally unique identifier for the volume. This would serve same purpose as UUID,
   but without the 16 bytes constraint which we can't usefully provide.

 - For the local directory backend, I've got the ability to choose between file
   formats on a per-volume basis. eg, /var/lib/xen/images can contain a mix of
   raw, qcow, vmdk, etc files. This is nice & flexible for the user, but a more
   complex thing to implement, since it means we have to probe each volume and
   try & figure out its format each time we list volumes. If we constrained the
   choice between formats to be at the pool level instead of the volume level
   we could avoid probing & thus simplify the code. This is what XenAPI does.

 - If creating non-sparse files, it can take a very long time to do the I/O to
   fill in the image. In virt-intsall/virt-manager we have nice incremental
   progress display. The API I've got currently though is blocking. This blocks
   the calling application. It also blocks the entire libvirtd daemon since we
   are single process. There are a couple of ways we can address this:

     - Allow the libvirtd daemon to serve each client connection in a separate
       thread. We'd need to adding some mutex locking to the QEMU driver and
       the storage driver to handle this. It would have been nice to avoid 
       threads, but I don't think we can much longer. 

     - For long running operations, spawn off a worker thread (or process) to
       perform the operation. Don't send a reply to the RPC message, instead
       just put the client on a 'wait queue', and get on with serving other
       clients. When the worker thread completes, send the RPC reply to the 
       original client.

     - Having the virStorageVolCreate()  method return immediately, giving back
       the client app some kind of 'job id'. The client app can poll on another
       API  virStorageVolJob() method to determine how far through the task has
       got. The implementation in the driver would have to spawn a worker thread
       to do the actual long operation.

       Possibly we can allow creation to be async or blocking by making use of
       the 'flags' field to virStorageVolCreate() method, eg VIR_STORAGE_ASYNC.
       If we make async behaviour optional, we still need some work in the
       daemon to avoid blocking all clients.

   This problem will also impact us when we add cloning of existing volumes.
   It already sort of hits us when saving & restoring VMs if they have large
   amounts of memory. So perhaps we need togo for the general solution of 
   making the daemon threaded per client connection. The ASYNC flag may still
   be useful anyway to get the incremental progress feedback in the UI.

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=|