[libvirt] libvirtd deadlock on shutdown

Hu Tao hutao at cn.fujitsu.com
Fri Jun 22 03:26:03 UTC 2012


On Thu, Jun 21, 2012 at 09:56:54AM -0600, Jim Fehlig wrote:
> Daniel P. Berrange wrote:
> > On Wed, Jun 20, 2012 at 03:07:16PM -0600, Jim Fehlig wrote:
> >   
> >> I'm looking into a libvirtd deadlock on daemon shutdown.  The deadlock
> >> occurs when shutting down virNetServer.  If the handling of a job is in
> >> flight in virNetServerHandleJob(), the virNetServer lock is acquired
> >> when freeing job->prog (src/rpc/virnetserver.c:167).  But the lock is
> >> already held in virNetServerFree(), which is blocked in
> >> virThreadPoolFree() waiting for all the workers to finish.  No progress
> >> can be made.
> >>
> >> The attached hack fixes the problem, but I'm not convinced this is an
> >> appropriate fix.  Is it necessary to hold the virNetServer lock when
> >> calling virNetServerProgramFree(job->prog)?  I notice the lock is not
> >> held in the error path of virNetServerHandleJob().
> >>
> >> Thanks,
> >> Jim
> >>
> >>     
> >
> >   
> >> diff --git a/src/rpc/virnetserver.c b/src/rpc/virnetserver.c
> >> index ae19e84..edd3196 100644
> >> --- a/src/rpc/virnetserver.c
> >> +++ b/src/rpc/virnetserver.c
> >> @@ -774,7 +774,9 @@ void virNetServerFree(virNetServerPtr srv)
> >>      for (i = 0 ; i < srv->nservices ; i++)
> >>          virNetServerServiceToggle(srv->services[i], false);
> >>  
> >> +    virNetServerUnlock(srv);
> >>      virThreadPoolFree(srv->workers);
> >> +    virNetServerLock(srv);
> >>  
> >>      for (i = 0 ; i < srv->nsignals ; i++) {
> >>          sigaction(srv->signals[i]->signum, &srv->signals[i]->oldaction, NULL);
> >>     
> >
> > The virNetServerPtr object is ref-counted, so technical;ly
> > only decrementing the ref count needs to be protected. Once
> > the ref count hits 0, then only the current thread (and the
> > workers) should be using the virNetServerPtr instance.
> >   
> 
> Ah, right.  Thanks for clarifying.
> 
> > So, yes, it is safe to call virNetServerUnlock before
> > virThreadPoolFree. Furthermore, it is /not/ required
> > to call virNetServerLock afterwards - no other thread
> > should be using it now the workers are dead. So you
> > can skip that extra lock call, and also remove the
> > unlock call much later
> >   
> 
> Something like the attached patch?
> 
> Regards,
> Jim
> 

> >From 583be33213e922899b23f036494886397b2549dc Mon Sep 17 00:00:00 2001
> From: Jim Fehlig <jfehlig at suse.com>
> Date: Thu, 21 Jun 2012 09:21:44 -0600
> Subject: [PATCH] Fix deadlock on libvirtd shutdown
> 
> When shutting down libvirtd, the virNetServer shutdown can deadlock
> if there are in-flight jobs being handled by virNetServerHandleJob().
> virNetServerFree() will acquire the virNetServer lock and call
> virThreadPoolFree() to terminate the workers, waiting for the workers
> to finish.  But in-flight workers will attempt to acquire the
> virNetServer lock, resulting in deadlock.
> 
> Fix the deadlock by unlocking the virNetServer lock before calling
> virThreadPoolFree().  This is safe since the virNetServerPtr object
> is ref-counted and only decrementing the ref count needs to be
> protected.  Additionally, there is no need to re-acquire the lock
> after virThreadPoolFree() completes as all the workers have
> terminated.
> ---
>  src/rpc/virnetserver.c |    6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/src/rpc/virnetserver.c b/src/rpc/virnetserver.c
> index ae19e84..9d71e53 100644
> --- a/src/rpc/virnetserver.c
> +++ b/src/rpc/virnetserver.c
> @@ -766,10 +766,9 @@ void virNetServerFree(virNetServerPtr srv)
>      virNetServerLock(srv);
>      VIR_DEBUG("srv=%p refs=%d", srv, srv->refs);
>      srv->refs--;
> -    if (srv->refs > 0) {
> -        virNetServerUnlock(srv);
> +    virNetServerUnlock(srv);

At this point other threads may have changed srv->refs...

> +    if (srv->refs > 0)

...so it's unsafe to test srv->refs here without locking.

For example, assume srv->refs is 2 at the beginning of virNetServerFree,

  thread A              thread B

  lock srv              lock srv (blocked)

  dec srv->refs

  (srv->refs is 1)

  unlock srv

                        lock srv

			dec srv->refs

			(srv->refs is 0)

			unlock src

  test srv->refs        test srv->refs


In this case both threads have found srv->refs is 0 and are going to
free srv...

Following patch fixes problem.

>From 25aa97e05aa76054b781a4e5e83781ee16d5afee Mon Sep 17 00:00:00 2001
From: Hu Tao <hutao at cn.fujitsu.com>
Date: Fri, 22 Jun 2012 11:15:01 +0800
Subject: [PATCH] fix a bug of ref count in virnetserver.c

The test of ref count is not protected by lock, which is unsafe because
the ref count may have been changed by other threads during the test.

This patch fixes this.
---
 src/rpc/virnetserver.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/rpc/virnetserver.c b/src/rpc/virnetserver.c
index 9d71e53..247ddd7 100644
--- a/src/rpc/virnetserver.c
+++ b/src/rpc/virnetserver.c
@@ -759,15 +759,16 @@ void virNetServerQuit(virNetServerPtr srv)
 void virNetServerFree(virNetServerPtr srv)
 {
     int i;
+    int refs;
 
     if (!srv)
         return;
 
     virNetServerLock(srv);
     VIR_DEBUG("srv=%p refs=%d", srv, srv->refs);
-    srv->refs--;
+    refs = --srv->refs;
     virNetServerUnlock(srv);
-    if (srv->refs > 0)
+    if (refs > 0)
         return;
 
     for (i = 0 ; i < srv->nservices ; i++)
-- 
1.7.4.4


-- 
Thanks,
Hu Tao




More information about the libvir-list mailing list