[libvirt] [PATCH] qemu: Allow migration to be cancelled at any phase

Michal Privoznik mprivozn at redhat.com
Thu Nov 8 11:26:51 UTC 2012


On 08.11.2012 11:49, Jiri Denemark wrote:
> On Thu, Nov 08, 2012 at 11:00:40 +0100, Michal Privoznik wrote:
>> On 07.11.2012 23:23, Jiri Denemark wrote:
>>>
>>> AbortJob API was never a synchronous one and I don't think we need to change
>>> it. Not to mention that this code unlocks and locks vm without holding an
>>> extra reference to it. And even if it did so, it cannot guarantee that the job
>>> it's checking in the loop is the same one it tried to cancel. It's quite
>>> unlikely but the original job could have finished and another one could have
>>> been started while this code was sleeping.
>>
>> However, AbortJob is documented as synchronous. If current implementation doesn't
>> follow docs it is a bug. On the other hand, I don't recall anybody screaming about
>> it so far. But that means nothing, right?
> 
> IIRC the implementation was never synchronous in which case I think we may
> just fix the documentation to match reality :-)
> 
>>>> @@ -1172,6 +1172,12 @@ qemuMigrationUpdateJobStatus(struct qemud_driver *driver,
>>>>      }
>>>>      priv->job.info.timeElapsed -= priv->job.start;
>>>>  
>>>> +    if (qemuDomainObjAbortAsyncJobRequested(vm)) {
>>>> +        VIR_DEBUG("Migration abort requested. Translating "
>>>> +                  "status to MIGRATION_STATUS_CANCELLED");
>>>> +        status = QEMU_MONITOR_MIGRATION_STATUS_CANCELLED;
>>>> +    }
>>>> +
>>>
>>> This check seems to be to late and that complicates the code later. If we keep
>>> qemuDomainAbortJob calling qemuMonitorMigrateCancel in qemuDomainAbortJob, we
>>> may count on that and check priv.job.asyncAbort before entering the monitor
>>> to tell QEMU to start migrating in qemuMigrationRun. If the abort is not
>>> requested by then, it may only happen after we call qemuMonitorMigrateTo* and
>>> in that case the migration will be properly aborted by
>>> qemuMonitorMigrateCancel.
>>>
>>
>> Not really. The issue I've seen was like this:
>>
>>    Thread A					Thread B
>> 1. start async migration out job
>> 2. 						Since job is set, abort it by calling 'migrate_cancel'
>> 3.						return to user
>> 4. issue 'migrate' on the monitor
>>
>> And I think your suggestion makes it just harder to hit this race and not really avoid it.
>> However with my code, we are guaranteed that 'migrate_cancel' will have an effect.
> 
> Oh right, we actually need to check priv.job.asyncAbort after entering the
> monitor (since we may be preempted while waiting for entering the monitor) but
> still before calling qemuMonitorMigrateTo* that actually starts the monitor.
> When we entered the monitor, the AbortJob must have already been there or it
> will come after we leave the monitor.
> 
> Jirka
> 

Yeah, now that I've traced all possibilities on a paper I see my picture
of situation wasn't quite correct as well. I'll post v2 shortly.

Michal




More information about the libvir-list mailing list