[lvm-devel] [PATCH] (DRAFT!) Fix a deadlock in CLVMD (possibly related to BZ 561226).
Milan Broz
mbroz at redhat.com
Wed Oct 20 10:40:48 UTC 2010
On 10/18/2010 07:17 PM, Petr Rockai wrote:
> the signalling code (pthread_cond_signal/pthread_cond_wait) in the
> pre_and_post_thread in clvmd is using the wait mutex (see man
> pthread_cond_wait) incorrectly, and this can cause clvmd to completely
> deadlock when the timing is right.
well, nobody even tried to answer, despite this quite important problem...
So, I see these important items:
1) always rely on implicit unlock in pthread_cond_wait()
2) some of the code now runs with client->bits.localsock.mutex locked,
mainly the do_pre_command() and do_post_command() + socket writes
3) the common code to signal is _only_ this pattern
pthread_mutex_lock(&client->bits.localsock.mutex);
client->bits.localsock.state = <STATE>;
pthread_cond_signal(&client->bits.localsock.cond);
pthread_mutex_unlock(&client->bits.localsock.mutex);
this is called from request_timed_out() and read_from_local_sock()
and it wakes up thrad always waiting in this code pattern
pthread_cond_wait(&client->bits.localsock.cond,
&client->bits.localsock.mutex);
I didn't found any other uses for this lock.
So the thread is signaled only when in cond_wait now (there is no
other place where signaling thread can take lock now).
This seems as expected fix.
The post/pre routines should not have problems with running with
localsock.mutex taken, same for socket writes - hope I am right:-)
Did I miss or misinterpret anything?
Not extra deep analysis, but for me it seems like correct fix,
I think we should commit it and run some tests...
Milan
p.s.
nitpicking
DEBUGLOG("in sub thread: client = %p\n", client);
+ pthread_mutex_lock(&client->bits.localsock.mutex);
/* Don't start until the LVM thread is ready */
pthread_mutex_lock(&lvm_start_mutex);
pthread_mutex_unlock(&lvm_start_mutex);
DEBUGLOG("Sub thread ready for work.\n");
Can we move that localsock.muext lock after the lvm_start_mutex exercise?
(Which I do not like either - isn't cond wait better for that?)
More information about the lvm-devel
mailing list