[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH] qemu: lock: unlock vm during qemuBuildCommandLine

On 2015/4/23 17:46, Michal Privoznik wrote:

> On 23.04.2015 11:32, Peter Krempa wrote:
>> On Thu, Apr 23, 2015 at 11:19:34 +0200, Michal Privoznik wrote:
>>> On 23.04.2015 10:30, Peter Krempa wrote:
>>>> On Thu, Apr 23, 2015 at 11:44:49 +0800, zhang bo wrote:
>>>>> The function qemuBuildCommandLine() may take a long time, for example
>>>>> if we configure tens of vifs for the guest, each may cost hundrands of
>>>>> milliseconds to create tap dev, senconds in total. Thus, unlock vm
>>>>> before calling it.
>>>>> Signed-off-by: Zhang Bo <oscar zhangbo huawei com>
>>>>> Signed-off-by: Zhou Yimin <zhouyimin huawei com>
>>>>> ---
>>>>>  src/qemu/qemu_process.c | 6 +++++-
>>>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>>>>> index 753afe8..d1aaaec 100644
>>>>> --- a/src/qemu/qemu_process.c
>>>>> +++ b/src/qemu/qemu_process.c
>>>>> @@ -4628,14 +4628,18 @@ int qemuProcessStart(virConnectPtr conn,
>>>>>      }
>>>>>      VIR_DEBUG("Building emulator command line");
>>>>> +    virObjectUnlock(vm);
>>>>>      if (!(cmd = qemuBuildCommandLine(conn, driver, vm->def, priv->monConfig,
>>>>>                                       priv->monJSON, priv->qemuCaps,
>>>>>                                       migrateFrom, stdin_fd, snapshot, vmop,
>>>>>                                       &buildCommandLineCallbacks, false,
>>>>>                                       qemuCheckFips(),
>>>>>                                       priv->autoNodeset,
>>>>> -                                     &nnicindexes, &nicindexes)))
>>>>> +                                     &nnicindexes, &nicindexes))) {
>>>>> +        virObjectLock(vm);
>>>>>          goto cleanup;
>>>>> +    }
>>>>> +    virObjectLock(vm);
>>>> Why do you need to unlock the object? The VM is starting at this point
>>>> so you won't be able to save any time since APIs will either be blocked
>>>> by a job or by the fact that the VM was not started.
>>> Not true. We still allow QUERY jobs, or APIs that lock a domain object
>>> but don't necessarily take a job. For instance, if in one thread I'm
>>> doing virDomainGetOSType(), virDomainGetInfo(), ... (essentially 'virsh
>>> dominfo') while the other just starts the domain. I don't see a reason
>>> why virDomainGetOSType() should wait for the domain to be started up.
>>> Domain state should have no affect on the guest OS type, should it?
>> OK that is technically right. I wanted to point out that most of the
>> stats are available only for online machines or it shouldn't be much of
>> a problem to dealy the delivery.
>> Your example certainly doesn't illustrate why the delay to format the
>> command line should be a problem when using libvirt. Or are you polling
>> virDomainGetOSType every millisecond?
>> I am curious to see why the delay would be a problem.
> Yep, I'm too. So far we don't really care much about libvirt response
> time (otherwise our critical sections would be much shorter), but maybe
> it's an issue for somebody.

The specific semantic is:
  migrating multiple guests simultaneously,  downtime of each guest will add up,
even to an unacceptable value.

1 suppose the downtime is 2 seconds when we migrate only 1 guest at one time.
2 suppose it costs 0.5sec to create each tapDev, and each guest has 20 vifs,
  that's 10 seconds for qemuBuildCommandLine.
3 now, we migrate 10 guest simultaneously, the downtime of the guests will vary from
  seconds to even 100 seconds.

The reason for the problem is that:
1 guestA locks vm while creating each tapDev(virNetDevTapCreate) in qemuBuildCommandLine(), for 10seconds
2 guestB calls qemuMigrationPrepareAny->*virDomainObjListAdd* to get its vm object, which locks 'doms'
  and waits for the vm lock.
3 doms will be locked until guestA unlock its vm, we say that's 10 seconds.
4 guestC calls qemuDomainMigrateFinish3->virDomainObjListFindByName, which tries to lock doms. because it's
  now locked by guestB, guestC blocks here, and it can't be unpaused for at least 10 seconds.
5 then comes to guestD, guestE, guestF, etc, the downtime will be added up, to even 50 seconds or more.
6 the command 'virsh list' is blocked as well.

Thus, we think the problem must be solved.


>>> On the other hand, I don't think we can just lock and unlock the domain
>>> object as we please. qemuBuildCommandLine is a very complex function and
>>> as such it calls many others. Some of them may rely on the fact, that
>>> the object is locked by caller.
>> Well, you definitely can't since almost everything in there is accessing
>> vm->def. Locking semantics would be broken right in the line after @vm
>> was unlocked by dereferencing it.
> Well, anything that would change a domain definition should grap a
> MODIFY job. But such jobs are serialized, so even if we unlock the
> domain we should be okay to still access vm->def. What I am more worried
> about are all the small functions that interact with system or
> something. Currently they are serialized by @vm lock, but one we remove
> it ...
> Michal
> .

After the discussion above, maybe it's better to move virNetDevTapCreate() prior to qemuBuildCommandLine(), what do
you think about that? If that's ok, I'd like to apply patchV2 here.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]