[Linux-cluster] Strange behavior(s) of DLM

Jeff jeff at intersystems.com
Mon Aug 9 15:53:51 UTC 2004


Friday, August 6, 2004, 8:54:29 AM, David Teigland wrote:

> On Wed, Aug 04, 2004 at 11:41:45PM -0400, Jeff wrote:
>> The attached routine demonstrates some strange
>> behavior in the DLM and it was responsible for the
>> dmesg text at the end of this note.
>> 
>> This is on a FC2, SMP box running cvs/latest version of
>> cman and the dlm. Its a 2 CPU box configured with 4 logical
>> CPUs.
>> 
>> I have a two node cluster and the two machines are identical
>> as far as I can tell with the exception of which order they are
>> listed in the cluster config file.
>> 
>> On node #1 (in the config file) when I run the attached test from
>> two terminals the output looks reasonable. The same as it does if
>> I run it on Tru64 or VMS (more or less).
>> 
>>       8923: over last 10.000 seconds, grant 8922, blkast 0, cancel 0
>>      18730: over last 9.001 seconds, grant 9807, blkast 0, cancel 0
>>      28403: over last 9.001 seconds, grant 9673, blkast 0, cancel 0
>> 
>> If you shut this down and start it up on node #2 (lx4) you start
>> to get messages that look like:
>>      91280: over last 10.000 seconds, grant 91279, blkast 0, cancel 0
>>     125138: NL Blocking Routine Start ^^^^^^^^^^^^^^^^^^^^^^^^^^
>>     125138: NL Blocking Notification on lockid 0x00010312 (mode 0)
>>     125138: NL Blocking Notification Rountine End  ^^^^^^^^^^^^^^^^^^^^
>>     141370: NL Blocking Routine Start ^^^^^^^^^^^^^^^^^^^^^^^^^^
>>     141371: NL Blocking Notification on lockid 0x00010312 (mode 0)
>>     141371: NL Blocking Notification Rountine End  ^^^^^^^^^^^^^^^^^^^^
>>     141373: NL Blocking Routine Start ^^^^^^^^^^^^^^^^^^^^^^^^^^


> You're running the program on two nodes at once right?  The line with "*"
> is when I started the program on a second node, so it appears I get the
> same thing.  I don't get any assertion failure, though.  That may be the
> result of changes I've checked in for some other bugs over the past couple
> days.

>      57150: over last 10.000 seconds, grant 57149, blkast 0, cancel 0
>     116825: over last 9.001 seconds, grant 59675, blkast 0, cancel 0
> *   123790: NL Blocking Routine Start ^^^^^^^^^^^^^^^^^^^^^^^^^^
>     123790: NL Blocking Notification on lockid 0x00010373 (mode 0)
>     123790: NL Blocking Notification Rountine End  ^^^^^^^^^^^^^^^^^^^^
>     123822: NL Blocking Routine Start ^^^^^^^^^^^^^^^^^^^^^^^^^^
>     123822: NL Blocking Notification on lockid 0x00010373 (mode 0)
>     123822: NL Blocking Notification Rountine End  ^^^^^^^^^^^^^^^^^^^^


I updated my sources this morning and I get neither the NL Blocking
routine start messages nor the assertion failures. In the past
I was able to get this quite easily so I suspect you have resolved
them.






More information about the Linux-cluster mailing list