[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[lvm-devel] [PATCH] (DRAFT!) Fix a deadlock in CLVMD (possibly related to BZ 561226).



Hi,

the signalling code (pthread_cond_signal/pthread_cond_wait) in the
pre_and_post_thread in clvmd is using the wait mutex (see man
pthread_cond_wait) incorrectly, and this can cause clvmd to completely
deadlock when the timing is right.

This was showing up on nevrast in buildbot. Possibly triggered by an
independent bug in my vgextend --restoremissing code that caused a
double unlock of orphans, which would *normally* be fairly harmless, but
it tripped this race. Was not happening on other machines...

Anyway, Milan says that this could be related to BZ 561226 where clvmd
runs into deadlocks as well. It may or may not be the same, but it is
quite likely that someone somewhere would run into the bug that this
patch is fixing as well.

NB., I only made very modest effort at checking if localsock.mutex was
not being used for anything else. Whoever ends up reviewing the code,
please make sure this is so! If the localsock.mutex is being used for
anything else, the patch could actually introduce some new bad
behaviours. In that case, just creating a new mutex in the same
structure just for the pre_and_post_thread would be necessary. But you
get the idea of the fix, anyway. (It is also recommended that the
reviewer spends a while reading pthread_cond_wait manpage, especially
the section about the mutex pointer, and also with a paper and pencil
sketching on how things could go wrong if the mutex is used incorrectly,
as in this case.)

Yours,
   Petr.

Index: daemons/clvmd/clvmd.c
===================================================================
RCS file: /cvs/lvm2/LVM2/daemons/clvmd/clvmd.c,v
retrieving revision 1.77
diff -u -p -r1.77 clvmd.c
--- daemons/clvmd/clvmd.c	17 Aug 2010 16:25:32 -0000	1.77
+++ daemons/clvmd/clvmd.c	18 Oct 2010 17:10:02 -0000
@@ -1520,6 +1520,7 @@ static __attribute__ ((noreturn)) void *
 	int pipe_fd = client->bits.localsock.pipe;
 
 	DEBUGLOG("in sub thread: client = %p\n", client);
+	pthread_mutex_lock(&client->bits.localsock.mutex);
 
 	/* Don't start until the LVM thread is ready */
 	pthread_mutex_lock(&lvm_start_mutex);
@@ -1564,7 +1565,6 @@ static __attribute__ ((noreturn)) void *
 		}
 
 		/* We may need to wait for the condition variable before running the post command */
-		pthread_mutex_lock(&client->bits.localsock.mutex);
 		DEBUGLOG("Waiting to do post command - state = %d\n",
 			 client->bits.localsock.state);
 
@@ -1573,7 +1573,6 @@ static __attribute__ ((noreturn)) void *
 			pthread_cond_wait(&client->bits.localsock.cond,
 					  &client->bits.localsock.mutex);
 		}
-		pthread_mutex_unlock(&client->bits.localsock.mutex);
 
 		DEBUGLOG("Got post command condition...\n");
 
@@ -1594,16 +1593,15 @@ static __attribute__ ((noreturn)) void *
 next_pre:
 		DEBUGLOG("Waiting for next pre command\n");
 
-		pthread_mutex_lock(&client->bits.localsock.mutex);
 		if (client->bits.localsock.state != PRE_COMMAND &&
 		    !client->bits.localsock.finished) {
 			pthread_cond_wait(&client->bits.localsock.cond,
 					  &client->bits.localsock.mutex);
 		}
-		pthread_mutex_unlock(&client->bits.localsock.mutex);
 
 		DEBUGLOG("Got pre command condition...\n");
 	}
+	pthread_mutex_unlock(&client->bits.localsock.mutex);
 	DEBUGLOG("Subthread finished\n");
 	pthread_exit((void *) 0);
 }

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]