[Linux-cluster] DLM locks with 1 node on 2 node cluster
Zelikov_Mikhail at emc.com
Zelikov_Mikhail at emc.com
Mon Aug 28 15:41:55 UTC 2006
I am using the latest cluster from RHEL4 branch. I have 2 node cluster:
nodes A and B. Node A grabs a lock in the exclusive mode, node B waits for a
membership change. I manually reset node A at which point node B gets the
membership change notification and then tries to acquire the lock in the
exclusive mode. At this point this operations locks forever. Once the node A
is up DLM returns with the lock acquired for node B - as expected.
However, if I shutdown node A instead of killing it then everything works as
expected - Node B gets the notification and the successfully grabs the lock
w/o locking up.
It can be easily reproduced with dlmtest: grab the lock on one machine in EX
mode (bof226), block on another for the same lock (bof227), kill the first
machine - see that we never acquire lock on the second:
1) *** GRAB LOCK MY_RES (bof226)
[root at bof226 usertest]# ./dlmtest -Q -m EX MY_RES -d 10000
locking MY_RES EX ...done (lkid = 1015e)
lockinfo: status = 0
lockinfo: resource = 'MY_RES'
lockinfo: grantcount = 1
lockinfo: convcount = 0
lockinfo: waitcount = 0
lockinfo: masternode = 1
lockinfo: lock: lkid = 1015e
lockinfo: lock: master lkid = 0
lockinfo: lock: parent lkid = 0
lockinfo: lock: node = 1
lockinfo: lock: pid = 3771
lockinfo: lock: state = 2
lockinfo: lock: grmode = 5
lockinfo: lock: rqmode = 255
2) *** GRAB LOCK MY_RES (bof227)
[root at bof227 usertest]# ./dlmtest -Q -m EX -d 10000 MY_RES
locking MY_RES EX ...
3) *** KILL bof226
4) *** WAITING FOREVER
5) *** BOOTING UP bof226 results in lock acquired
lockinfo: status = 0
lockinfo: resource = 'MY_RES'
lockinfo: grantcount = 1
lockinfo: convcount = 0
lockinfo: waitcount = 0
lockinfo: masternode = 2
lockinfo: lock: lkid = 10312
lockinfo: lock: master lkid = 103eb
lockinfo: lock: parent lkid = 0
lockinfo: lock: node = 2
lockinfo: lock: pid = 4136
lockinfo: lock: state = 2
lockinfo: lock: grmode = 5
lockinfo: lock: rqmode = 255
Has anybody else seen this? I was wondering if this is a bug or there is
something special about 2-node clusters, or do I misunderstand how it
supposed to work?
Mike
More information about the Linux-cluster
mailing list