[libvirt] [PATCH 3/3] Avoid rare race when undefining domain

Martin Kletzander mkletzan at redhat.com
Fri Oct 31 13:59:50 UTC 2014


On Fri, Oct 31, 2014 at 02:28:49PM +0100, Michal Privoznik wrote:
>On 31.10.2014 07:35, Martin Kletzander wrote:
>> On Thu, Oct 30, 2014 at 05:44:23PM +0000, Daniel P. Berrange wrote:
>>> On Thu, Oct 30, 2014 at 06:39:56PM +0100, Michal Privoznik wrote:
>>>> On 30.10.2014 16:04, Martin Kletzander wrote:
>>>> >When one domain is being undefined and at the same time started, for
>>>> >example, there is a possibility of a rare problem occuring.
>>>> >
>>>> >  - Thread 1 does virDomainUndefine(), has the lock, checks that the
>>>> >    domain is active and because it's not, calls
>>>> >    virDomainObjListRemove().
>>>> >
>>>> >  - Thread 2 does virDomainCreate() and tries to lock the domain.
>>>> >
>>>> >  - Thread 1 needs to lock domain list in order to remove the domain
>>>> from
>>>> >    it, but must unlock domain first (proper order is to lock domain
>>>> list
>>>> >    first and the domain itself second).
>>>> >
>>>> >  - Thread 2 grabs the lock, starts the domain and releases the lock.
>>>> >
>>>> >  - Thread 1 grabs the lock and removes the domain from list.
>>>> >
>>>> >With this patch:
>>>> >
>>>> >  - The undefining domain gets marked as "to undefine" before it is
>>>> >    unlocked.
>>>> >
>>>> >  - If domain is found in any of the search APIs, it's returned only if
>>>> >    it is not marked as "to undefine".  The check is done while the
>>>> >    domain is locked.
>>>> >
>>>> >Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1150505
>>>> >
>>>> >Signed-off-by: Martin Kletzander <mkletzan at redhat.com>
>>>> >---
>>>> >  src/conf/domain_conf.c | 23 ++++++++++++++++++++---
>>>> >  src/conf/domain_conf.h |  1 +
>>>> >  2 files changed, 21 insertions(+), 3 deletions(-)
>>>> >
>>>> >diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
>>>> >index 43574e1..b92a58a 100644
>>>> >--- a/src/conf/domain_conf.c
>>>> >+++ b/src/conf/domain_conf.c
>>>> >@@ -1054,8 +1054,13 @@ virDomainObjPtr
>>>> virDomainObjListFindByID(virDomainObjListPtr doms,
>>>> >      virDomainObjPtr obj;
>>>> >      virObjectLock(doms);
>>>> >      obj = virHashSearch(doms->objs, virDomainObjListSearchID, &id);
>>>> >-    if (obj)
>>>> >+    if (obj) {
>>>> >          virObjectLock(obj);
>>>> >+        if (obj->undefining) {
>>>> >+            virObjectUnlock(obj);
>>>> >+            obj = NULL;
>>>> >+        }
>>>> >+    }
>>>> >      virObjectUnlock(doms);
>>>> >      return obj;
>>>> >  }
>>>>
>>>> I find this too hackish, Wouldn't it be better to hold the domain list
>>>> locked during the whole undefine procedure? On one hand the
>>>> throughput would
>>>> get hurt but on the other, the code would be cleaner. Or maybe we can
>>>> just
>>>> make the Undefine() API to grab the QEMU_JOB_MODIFY job which would
>>>> make the
>>>> APIs serialize.
>>>
>>> Yep, it feels like "Undefine" could be treated as a job, so that all
>>> the other APIs get blocked/serialized by it.
>>>
>>
>> That won't work for anything else than QEMU.  I tried creating
>> something like a job in the generic level which would fix it in
>> e.g. LXC as well.
>
>I've always felt that we should have a job control in other drivers too.
>In the meantime, either we should go with domain list locked through
>some APIs, or just solve this in QEMU driver and postpone the fix in
>other drivers until such time we introduce job control to them.
>

If that's your preference, I'll do that with _MODIFY then.

See you in v2,
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20141031/fa8f33a5/attachment-0001.sig>


More information about the libvir-list mailing list