[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH 3/5] Push nwfilter update locking up to top level



On 01/23/2014 07:37 AM, Daniel P. Berrange wrote:
The NWFilter code has as a deadlock race condition between
the virNWFilter{Define,Undefine} APIs and starting of guest
VMs due to mis-matched lock ordering.

In the virNWFilter{Define,Undefine} codepaths the lock ordering
is

   1. nwfilter driver lock
   2. virt driver lock
   3. nwfilter update lock
   4. domain object lock

In the VM guest startup paths the lock ordering is

   1. virt driver lock
   2. domain object lock
   3. nwfilter update lock

As can be seen the domain object and nwfilter update locks are
not acquired in a consistent order.

The fix used is to push the nwfilter update lock upto the top
level resulting in a lock ordering for virNWFilter{Define,Undefine}
of

   1. nwfilter driver lock
   2. nwfilter update lock
   3. virt driver lock
   4. domain object lock

and VM start using

   1. nwfilter update lock
   2. virt driver lock
   3. domain object lock

Makes sense... there are other paths as well:

- SIGHUP or restart of libvirt that recreates all filters
- late instantiation of filters following detection of VM's IP address


This has the effect of serializing VM startup once again, even if
no nwfilters are applied to the guest.

Signed-off-by: Daniel P. Berrange <berrange redhat com>
---
  src/conf/nwfilter_conf.c               | 6 ------
  src/lxc/lxc_driver.c                   | 6 ++++++
  src/nwfilter/nwfilter_driver.c         | 8 ++++----
  src/nwfilter/nwfilter_gentech_driver.c | 6 ------
  src/qemu/qemu_driver.c                 | 6 ++++++
  src/uml/uml_driver.c                   | 4 ++++
  6 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/src/conf/nwfilter_conf.c b/src/conf/nwfilter_conf.c
index 6db8ea9..e712ca5 100644
--- a/src/conf/nwfilter_conf.c
+++ b/src/conf/nwfilter_conf.c
@@ -2990,14 +2990,12 @@ virNWFilterObjAssignDef(virNWFilterObjListPtr nwfilters,
          return NULL;
      }

-    virNWFilterLockFilterUpdates();

      if ((nwfilter = virNWFilterObjFindByName(nwfilters, def->name))) {

          if (virNWFilterDefEqual(def, nwfilter->def, false)) {
              virNWFilterDefFree(nwfilter->def);
              nwfilter->def = def;
-            virNWFilterUnlockFilterUpdates();
              return nwfilter;
          }


I think you should add a comment to this function that it must be called with this lock held. With the removal of the locking from this function here you have to do the locking in the callers. Callers of virNWFilterObjAssignDef are:

src/conf/nwfilter_conf.c::virNWFilterObjLoad : I don't see this function grabbing the lock. I think this is missing.
src/conf/nwfilter_driver.c::nwfilterDefineXML : ok, found it below




diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c
index e319234..b1f8a89 100644
--- a/src/lxc/lxc_driver.c
+++ b/src/lxc/lxc_driver.c
@@ -1015,6 +1015,8 @@ static int lxcDomainCreateWithFiles(virDomainPtr dom,

      virCheckFlags(VIR_DOMAIN_START_AUTODESTROY, -1);

+    virNWFilterLockFilterUpdates();
+
      if (!(vm = lxcDomObjFromDomain(dom)))
          goto cleanup;

@@ -1053,6 +1055,7 @@ cleanup:
      if (event)
          virObjectEventStateQueue(driver->domainEventState, event);
      virObjectUnref(cfg);
+    virNWFilterUnlockFilterUpdates();
      return ret;
  }

@@ -1109,6 +1112,8 @@ lxcDomainCreateXMLWithFiles(virConnectPtr conn,

      virCheckFlags(VIR_DOMAIN_START_AUTODESTROY, NULL);

+    virNWFilterLockFilterUpdates();
+
      if (!(caps = virLXCDriverGetCapabilities(driver, false)))
          goto cleanup;

@@ -1164,6 +1169,7 @@ cleanup:
          virObjectEventStateQueue(driver->domainEventState, event);
      virObjectUnref(caps);
      virObjectUnref(cfg);
+    virNWFilterUnlockFilterUpdates();
      return dom;
  }

Probably these calls are correctly placed since all other creation functions funnel into them.Same for UML and QEMU drivers.


diff --git a/src/nwfilter/nwfilter_gentech_driver.c b/src/nwfilter/nwfilter_gentech_driver.c
index 89913cf8..f0e78ed 100644
--- a/src/nwfilter/nwfilter_gentech_driver.c
+++ b/src/nwfilter/nwfilter_gentech_driver.c
@@ -935,7 +935,6 @@ _virNWFilterInstantiateFilter(virNWFilterDriverStatePtr driver,
      int ifindex;
      int rc;

-    virNWFilterLockFilterUpdates();

      /* after grabbing the filter update lock check for the interface; if
         it's not there anymore its filters will be or are being removed
@@ -964,7 +963,6 @@ _virNWFilterInstantiateFilter(virNWFilterDriverStatePtr driver,
                                          foundNewFilter);

  cleanup:
-    virNWFilterUnlockFilterUpdates();

      return rc;
  }

This function is called by virNWFilterInstantiateFilter and virNWFilterUpdateInstantiateFilter. So, in the former case this is called when a VM is started, in the latter case if VMs' filters are updated while the VM is running. I think you are covering the VM creation case with the above calls for lxc and further below with the changes to the qemu driver and the uml driver.

There is at least one other part that needs to be covered:

src/conf/nwfilter_conf.c::virNWFilterInstFiltersOnAllVMs :: kicked off by nwfilterStateReload upon SIGHUP. We need a lock there now.

src/conf/nwfilter_conf.c::virNWFilterTriggerVMFilterRebuild::
    - called by virNWFilterTestUnassignDef:
- called by src/nwfilter/nwfilter/nwfilterUndefine:: You seem to just reorder some locks there; now we need a (writer) lock there - called by virNWFilterObjAssignDef: which must be called with lock called following above reasoning




@@ -984,8 +982,6 @@ virNWFilterInstantiateFilterLate(virNWFilterDriverStatePtr driver,
      int rc;
      bool foundNewFilter = false;

-    virNWFilterLockFilterUpdates();
-
      rc = __virNWFilterInstantiateFilter(driver,
                                          vmuuid,
                                          true,
@@ -1009,8 +1005,6 @@ virNWFilterInstantiateFilterLate(virNWFilterDriverStatePtr driver,
          }
      }

-    virNWFilterUnlockFilterUpdates();
-
      return rc;
  }

This function here is called by
src/nwfilter/nwfilter_learnip.c::learnIPAddressThread
src/nwfilter/nwfilter_dhcpsnoop.c::virNWFilterSnoopIPLeaseInstallRule

They instantiate the filters once a VM's IP address has been detected. So this is where the *Late() comes from.

If you remove the locking from here, you have to lock it there. Considering what you do layer, I would keep the lock here and convert this into a reader lock layer on.


The farther away we do this from the calls where the actual action is happening, the more tricky this becomes. I would almost recommend to plant an assert into those far way functions that tests whether the lock has been grabbed.

Regards,
Stefan


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]