[Libvir] PATCH: 0/7 Implementation of storage APIs

Wed Oct 31 22:12:00 UTC 2007

On Wed, Oct 31, 2007 at 02:16:26PM +0100, Jim Meyering wrote:
> "Daniel P. Berrange" <berrange at redhat.com> wrote:
> >  - For the local directory backend, I've got the ability to choose
> >    between file formats on a per-volume basis. eg, /var/lib/xen/images can
> >    contain a mix of raw, qcow, vmdk, etc files. This is nice & flexible
> >    for the user, but a more complex thing to implement, since it means
> >    we have to probe each volume and try & figure out its format each
> 
> Have there been requests for this feature?
> The probe-and-recognize part doesn't sound too hard, but if
> a large majority of use cases have homogeneous volumes-per-pool,
> then for starters at least, maybe we can avoid the complexity.

The 'qemu-img' command has an 'info' command to get back metadata for
all the different formats it knows about. I've realized we already need
to run this command to get disk capacity out for non-raw files, so we'd
be doing the probe & recognie bit anyway.

> A possible compromise (albeit ugly), _if_ we can dictate naming policy:
> let part of a volume name (suffix, substring, component, whatever)
> tell libvirtd its type.  As I said, ugly, and hence probably not
> worth considering, but I had to say it :-)

Yeah I thought about this, but I think I'd like to be able to have libvirt
manage a storage pool that was previously created by an admin, which means
we can't really assume that al the files are the same format, or that they
have a consistent file extension. 

> >  - If creating non-sparse files, it can take a very long time to do the
> >    I/O to fill in the image. In virt-intsall/virt-manager we have nice
> >    incremental progress display. The API I've got currently though is
> >    blocking. This blocks the calling application. It also blocks the
> >    entire libvirtd daemon since we are single process. There are a couple
> >    of ways we can address this:
> >
> >      1 Allow the libvirtd daemon to serve each client connection in
> >        a separate thread. We'd need to adding some mutex locking to the
> >        QEMU driver and the storage driver to handle this. It would have
> >        been nice to avoid threads, but I don't think we can much longer.
> >
> >      2 For long running operations, spawn off a worker thread (or
> >        process) to perform the operation. Don't send a reply to the RPC
> >        message, instead just put the client on a 'wait queue', and get
> >        on with serving other clients. When the worker thread completes,
> >        send the RPC reply to the original client.
> >
> >      3 Having the virStorageVolCreate()  method return immediately,
> >        giving back the client app some kind of 'job id'. The client app
> >        can poll on another API  virStorageVolJob() method to determine
> >        how far through the task has got. The implementation in the
> >        driver would have to spawn a worker thread to do the actual
> >        long operation.
> 
> I like the idea of spawning off a thread for a very precise
> and limited-scope task.
> 
> On first reading, I preferred your #2 worker-thread-based solution.
> Then, client apps simply wait -- i.e., don't have to poll.
> But we'd still need another interface for progress feedback, so #3
> starts to look better: client progress feedback might come almost
> for free, while polling to check for completion.
> 
> >        Possibly we can allow creation to be async or blocking by
> >        making use of the 'flags' field to virStorageVolCreate() method,
> >        eg VIR_STORAGE_ASYNC.  If we make async behaviour optional, we
> >        still need some work in the daemon to avoid blocking all clients.
> >
> >    This problem will also impact us when we add cloning of existing
> >    volumes.  It already sort of hits us when saving & restoring VMs
> >    if they have large amounts of memory. So perhaps we need togo
> >    for the general solution of making the daemon threaded per client
> >    connection. The ASYNC flag may still be useful anyway to get the
> >    incremental progress feedback in the UI.
> 
> Could we just treat that as another type of task to hand out to
> a worker thread?

Yes, that could work - we'd basically end up doing a combo of #2 and #3
in that case.  Always do the long jobs in a dedicated worker thread.
If client uses the ASYNC flag they can poll for completion/progress.
if they don't use ASYNC flag, we can just stuff them on a wait queue
until the job worker thread finishes.

> Otherwise, this (#1) sounds a lot more invasive, but that's just my
> relatively uninformed impression.

Yes, option #1 is defintely much much more invasive. At the very least
we'll need to put in significant mutex locking for the QEMU driver
global state.

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=|