[libvirt] [PATCH] BlockJob: Support sync/async virDomainBlockJobAbort

Adam Litke agl at us.ibm.com
Fri Jan 20 14:47:39 UTC 2012


On Thu, Jan 19, 2012 at 05:39:43PM -0700, Eric Blake wrote:
> On 01/13/2012 01:51 PM, Adam Litke wrote:
> > Qemu has changed the semantics of the "block_job_cancel" API.  Originally, the
> > operation was synchronous (ie. upon command completion, the operation was
> > guaranteed to be completely stopped).  With the new semantics, a
> > "block_job_cancel" merely requests that the operation be cancelled and an event
> > is triggered once the cancellation request has been honored.
> 
> How can we tell the two implementations of qemu apart?  That is, how is
> libvirt supposed to know whether it is calling the old (blocking)
> variant, or the new async variant, in order to know whether it should
> wait for an event?  On the other hand, I think we can make things work:
> 
> call cancel, then call job status
> if the job is complete, then we are done, whether or not we have old or
> new command semantics (and if we have new command semantics, be prepared
> to ignore an event from qemu)

I don't really like ignoring domain events based on one specific API invocation
-- see comment farther below.

> if the job is not complete, the we have the new semantics, and we know
> to expect the event

Pretty clever, but there is a very small race window where the cancel completed
and a new job has started immediately.  In that case, we could assume we have
new command semantics when we actually have an old qemu.

> 
> coupled with:
> 
> if the user passed 0 for the flag (they want blocking), then make the
> command wait for completion (either by job-status query or by getting
> the event), and no event is passed to the user
> 
> if the user passed ASYNC for the flag, then they want an event; if we
> detect the async variant of the command, then we hook up our qemu event
> handler to turn around and issue a user event.  Conversely, if job-query
> says the command is complete, then we generate our own event without
> waiting for a qemu event.
> 
> That means that we _don't_ blindly pass the qemu event on to the user;
> rather, on reception of a qemu event, we determine if we have an
> incomplete cancel in flight, and only then do we pass it on to the user;
> otherwise, we assume that the qemu event came after we already saw the
> job-info query saying things were complete, and therefore, we had
> already generated the user's event and can ignore the qemu event.
> 
> It would help things if the new qemu command were spelled with a
> different name than the old blocking version.
> 
> > 
> > To adopt the new semantics while preserving compatibility I propose the
> > following updates to the virDomainBlockJob API:
> > 
> > A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELLED will be recognized by
> > libvirt.  Regardless of the flags used with virDomainBlockJobAbort, this event
> > will be raised whenever it is received from qemu.  This event indicates that a
> > block job has been successfully cancelled.
> 
> I'm not sure it should be raised unconditionally on receipt of a qemu
> event, but only in the case of a virDomainBlockJobAbort(ASYNC) where we
> detect that cancel is in flight when we return control to the user.

What happens if an impatient user calls cancel several times on multiple
devices, sometimes using ASYNC and sometimes not).  Clearly then, we should only
raise events for the devices that the user passed the ASYNC flag.  We are going
to need a flag per disk that specifies whether block_job_cancel events are
user-deliverable.  This is all getting pretty complicated.  Can we take a step
back and look at how we can simplify this?

I just thought of another reason why I would prefer not to mask out events based
on the initial API call.  Imagine a libvirtd where multiple users/applications
are connected.  User A is a disk manager that monitors BlockJob operations (for
logging and VM placement decision making).  User B is an administrator.  If we
mask the cancel event depending on User B's invocation of the API, then User A
will not be guaranteed to see a consistent view of what is happening on the
system (ie. it may not receive some events it was expecting).

I am not sure if this is practical (or desireable), but here is an idea:

 - We always deliver any events that may arise
 - For the case where an event is requested but qemu is too old to issue one:
   * Create a new timer with a callback that calls query-block-jobs to check for
     active jobs.  If no active jobs are found, issue the event.  Otherwise,
     reset the timer.  (We would need to handle the case where a new job is
     created before the timer fires.  This isn't that difficult).

> > A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC will be added to the
> > virDomainBlockJobAbort API.  When enabled, this function will operate
> > asynchronously (ie, it can return before the job has actually been cancelled).
> > When the API is used in this mode, it is the responsibility of the caller to
> > wait for a VIR_DOMAIN_BLOCK_JOB_CANCELLED event or poll via the
> > virDomainGetBlockJobInfo API to check the cancellation status.
> 
> Agreed.
> 
> > 
> > Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll
> > using qemu's "query-block-jobs" API and will not return until the operation has
> > been completed.  API users are advised that this operation is unbounded and
> > further interaction with the domain during this period may block.
> 
> Agreed.
> 
> > 
> > This patch implements the new event type, the API flag, and the polling.  The
> > main outstanding issue is whether we should bound the amount of time we will
> > wait for cancellation and return an error.
> 
> I'm almost wondering if we should break this patch into a series of
> smaller patches, but I'll go ahead and review it intact.
> 
> > 
> > Comments on this proposal?
> > 
> > Signed-off-by: Adam Litke <agl at us.ibm.com>
> > Cc: Stefan Hajnoczi <stefanha at gmail.com>
> > Cc: Eric Blake <eblake at redhat.com>
> > ---
> >  include/libvirt/libvirt.h.in |   10 ++++++++++
> >  src/libvirt.c                |    9 ++++++++-
> >  src/qemu/qemu_driver.c       |   24 +++++++++++++++++-------
> >  src/qemu/qemu_monitor.c      |   24 ++++++++++++++++++++++++
> >  src/qemu/qemu_monitor.h      |    2 ++
> >  src/qemu/qemu_monitor_json.c |   36 +++++++++++++++++++++++++++++-------
> >  6 files changed, 90 insertions(+), 15 deletions(-)
> 
> Your patch adds trailing whitespace :(  'make syntax-check' would
> diagnose this, and other issues.
> 
> > 
> > diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in
> > index e436f3c..fc7f028 100644
> > --- a/include/libvirt/libvirt.h.in
> > +++ b/include/libvirt/libvirt.h.in
> > @@ -1776,6 +1776,15 @@ typedef enum {
> >      VIR_DOMAIN_BLOCK_JOB_TYPE_PULL = 1,
> >  } virDomainBlockJobType;
> >  
> > +/**
> > + * virDomainBlockJobAbortFlags:
> > + *
> > + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion
> > + */
> > +typedef enum {
> > +    VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1,
> 
> I'd write this as (1 << 0), to make it easier to add new flags in the
> future as (1 << 1) and so forth.
> 
> > +++ b/src/libvirt.c
> > @@ -17299,7 +17299,7 @@ error:
> >   * virDomainBlockJobAbort:
> >   * @dom: pointer to domain object
> >   * @disk: path to the block device, or device shorthand
> > - * @flags: extra flags; not used yet, so callers should always pass 0
> > + * @flags: bitwise or of virDomainBlockJobAbortFlags
> 
> Nit: for consistency, s/bitwise or/bitwise-OR/
> 
> >   *
> >   * Cancel the active block job on the given disk.
> >   *
> > @@ -17310,6 +17310,13 @@ error:
> >   * can be found by calling virDomainGetXMLDesc() and inspecting
> >   * elements within //domain/devices/disk.
> >   *
> > + * By default, this function performs a synchronous operation and the caller
> > + * may assume that the operation has completed when 0 is returned.  However,
> > + * BlockJob operations may take a long time to complete, and during this time
> > + * further domain interactions may be unresponsive.  To avoid this problem, pass
> > + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the flags argument to enable asynchronous
> > + * behavior.  When the job has been cancelled, a BlockJob event will be emitted.
> 
> Interesting mix of US (behavior) and UK (cancelled) spelling in one
> line.  We tend to stick to US spelling (canceled) in the public docs,
> although this is not a hard and fast rule.
> 
> I'm not sure whether we should guarantee that the BlockJob event will
> only be emitted if ASYNC was passed in; or whether a BlockJob event is
> unconditionally emitted even if the user passed flags of 0 and therefore
> the cancellation completed.  I guess my thoughts at the head of this
> email would lean toward the former interpretation (require ASYNC before
> the event will reach the user).
> 
> > +++ b/src/qemu/qemu_driver.c
> > @@ -11379,7 +11379,7 @@ cleanup:
> >  static int
> >  qemuDomainBlockJobImpl(virDomainPtr dom, const char *path,
> >                         unsigned long bandwidth, virDomainBlockJobInfoPtr info,
> > -                       int mode)
> > +                       int flags, int mode)
> 
> Use 'unsigned int' for the flags ('make syntax-check' caught this), and
> I'd make flags the last argument.
> 
> >  {
> >      struct qemud_driver *driver = dom->conn->privateData;
> >      virDomainObjPtr vm = NULL;
> > @@ -11420,6 +11420,15 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path,
> >      qemuDomainObjEnterMonitorWithDriver(driver, vm);
> >      priv = vm->privateData;
> >      ret = qemuMonitorBlockJob(priv->mon, device, bandwidth, info, mode);
> > +    /*
> > +     * Qemu provides asynchronous block job cancellation but without the 
> > +     * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a synchronous
> > +     * operation.  Provide this behavior by waiting here (with the monitor
> > +     * locked) so we don't get confused by newly scheduled block jobs.
> > +     */
> > +    if (ret == 0 && mode == BLOCK_JOB_ABORT &&
> > +                    !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC))
> 
> Indentation - I'd make the ! of the second row flush with 'ret' of the
> first row:
> 
>     if (ret ...
>         !(flags ...
> 
> Semantic wise, I don't think this quite works with the older semantics
> of block_job_cancel.  That is, I think that you have to have logic more
> like:
> 
> if (ret == 0 && mode == BLOCK_JOB_ABORT) {
>     query status
>     if done {
>         if ASYNC
>             generate libvirt event
>     } else {
>         if !ASYNC
>             wait for completion
>         else
>             set state to convert qemu event to libvirt event
>     }
> }
> 
> > @@ -11439,8 +11448,7 @@ cleanup:
> >  static int
> >  qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags)
> >  {
> > -    virCheckFlags(0, -1);
> 
> Don't drop this line.  Rather, alter it to be:
> 
> virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC, -1);
> 
> to state that we are explicitly allowing only that one flag value
> through (previously, we were allowing no flags at all; dropping the line
> would allow all 32 bits through).
> 
> > +++ b/src/qemu/qemu_monitor.c
> > @@ -2585,6 +2585,30 @@ int qemuMonitorScreendump(qemuMonitorPtr mon,
> >      return ret;
> >  }
> >  
> > +/* Poll the monitor to wait for the block job on a given disk to end.
> > + * We don't need to worry about another block job starting since we have the
> > + * driver locked. */
> 
> Can qemu support multiple block jobs in parallel?  If so, are we going
> to regret our decision to serialize things in libvirt?

There can be one per block device.

> > +int
> > +qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon, const char *device)
> > +{
> > +    VIR_DEBUG("mon=%p, device=%p", mon, device);
> > +    /* XXX: Should we provide some sort of escape hatch for this wait? */
> 
> I'm okay with waiting indefinitely, for lack of any better suggestions
> on how we would let the user pick a timeout value.
> 
> > +    while (1) {
> > +        /* Poll every 100ms */
> > +        int ret;
> > +        struct timespec ts = { .tv_sec = 0, .tv_nsec = 100 * 1000 * 1000ull };
> > +        virDomainBlockJobInfo info;
> 
> usleep() instead of nanosleep() might be an easier interface to use for
> the length of time you are sleeping.  libvirt.git only has one other
> instance of nanosleep() at the moment.  (That reminds me: someday, I'd
> really like an LGPLv2+ gnulib interface that lets you call
> sane_sleep(double), without having  all the shenanigans of nanosleep,
> and with more range than usleep - but that's a story for another day).
> 
> > +
> > +        ret = qemuMonitorBlockJob(mon, device, 0, &info, BLOCK_JOB_INFO);
> > +        if (ret < 0)
> > +            return -1;
> > +        else if (ret == 0)
> > +            return 0;
> > +        else
> > +            nanosleep(&ts, NULL); 
> 
> Ah, so we are polling solely on the job info command, and not waiting
> for the qemu event.  I guess we could make things slightly smarter by
> also paying attention to whether we expect a qemu event, in which case
> we avoid the polling, but that would make this loop a bit more complex,
> and I'm not sure if it is worth the tradeoff.

In earlier discussions, we decided to forget the event for this part because
then we need to know whether qemu even supports them.  I suppose we would just
need to poll the first time and that would tell us if events are supported.

> > +static void qemuMonitorJSONHandleBlockJobCancelled(qemuMonitorPtr mon,
> > +                                                   virJSONValuePtr data)
> > +{
> > +    qemuMonitorJSONHandleBlockJobImpl(mon, data, VIR_DOMAIN_BLOCK_JOB_CANCELLED);
> 
> This is where I'd make things conditional to only emit a cancelled event
> if the user is expecting one because they passed the ASYNC flag.
> 
> I think there are still a few design issues to address in a v2 patch,
> but I'd love to see this go in before 0.9.10.  I wish I had more time to
> actually contribute fixes; but here's my quickie fixes for just the
> syntax problems caught by 'make syntax-check', to prove that I at least
> compile-tested and minimally played with this patch.  If anyone else has
> comments on the approach or my suggestions for tweaking the design
> slightly - speak up now!

Thanks for your careful review!

> diff --git i/src/libvirt.c w/src/libvirt.c
> index 3821dd8..1507338 100644
> --- i/src/libvirt.c
> +++ w/src/libvirt.c
> @@ -17379,7 +17379,7 @@ error:
>   * further domain interactions may be unresponsive.  To avoid this
> problem, pass
>   * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the flags argument to enable
> asynchronous
>   * behavior.  When the job has been cancelled, a BlockJob event will be
> emitted.
> - *
> + *
>   * Returns -1 in case of failure, 0 when successful.
>   */
>  int virDomainBlockJobAbort(virDomainPtr dom, const char *disk,
> diff --git i/src/qemu/qemu_driver.c w/src/qemu/qemu_driver.c
> index cc20c4f..3d1766b 100644
> --- i/src/qemu/qemu_driver.c
> +++ w/src/qemu/qemu_driver.c
> @@ -11239,7 +11239,7 @@ cleanup:
>  static int
>  qemuDomainBlockJobImpl(virDomainPtr dom, const char *path,
>                         unsigned long bandwidth,
> virDomainBlockJobInfoPtr info,
> -                       int flags, int mode)
> +                       unsigned int flags, int mode)
>  {
>      struct qemud_driver *driver = dom->conn->privateData;
>      virDomainObjPtr vm = NULL;
> @@ -11281,7 +11281,7 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const
> char *path,
>      priv = vm->privateData;
>      ret = qemuMonitorBlockJob(priv->mon, device, bandwidth, info, mode);
>      /*
> -     * Qemu provides asynchronous block job cancellation but without the
> +     * Qemu provides asynchronous block job cancellation but without the
>       * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a
> synchronous
>       * operation.  Provide this behavior by waiting here (with the monitor
>       * locked) so we don't get confused by newly scheduled block jobs.
> diff --git i/src/qemu/qemu_monitor.c w/src/qemu/qemu_monitor.c
> index 5c22486..9095eb6 100644
> --- i/src/qemu/qemu_monitor.c
> +++ w/src/qemu/qemu_monitor.c
> @@ -1,7 +1,7 @@
>  /*
>   * qemu_monitor.c: interaction with QEMU monitor console
>   *
> - * Copyright (C) 2006-2011 Red Hat, Inc.
> + * Copyright (C) 2006-2012 Red Hat, Inc.
>   * Copyright (C) 2006 Daniel P. Berrange
>   *
>   * This library is free software; you can redistribute it and/or
> @@ -2605,7 +2605,7 @@ qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon,
> const char *device)
>          else if (ret == 0)
>              return 0;
>          else
> -            nanosleep(&ts, NULL);
> +            nanosleep(&ts, NULL);
>      }
>  }
> 
> -- 
> Eric Blake   eblake at redhat.com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 



-- 
Adam Litke <agl at us.ibm.com>
IBM Linux Technology Center




More information about the libvir-list mailing list