[Linux-cluster] occasional cluster crashes

Lon Hohberger lhh at redhat.com
Wed Nov 15 16:18:41 UTC 2006


On Tue, 2006-11-14 at 16:44 +0100, Fabrizio Lippolis wrote:

> The cluster is running MySQL, one of the machines runs the MySQL process 
> at a time while the database files are on the disk array. I checked that 
> if I kill the process, it will migrate on the second machine. From time 
> to time I experience occasional lockups of one of the two machines, it 
> doesn't happen very often and apparently without reason. The only 
> solution in this case is to brutally switch off the machine and reboot. 
> The problem started to be much more frequent when I tried to add another 
> service to the cluster, a LDAP directory. The crashes happened sometimes 
> more than once a day.

:o

The only problems I'm aware of related to cluster service counts are
performance related (rgmanager used to slow down a lot with more
services), and only on pre-U4 version.


> I already wrote about this problem some time ago and somebody answered 
> that it could be caused because of the connection of the nodes to the 
> disk array. When a node is accessing the disk array the SCSI bus will 
> prevent the other node from doing something. Can anybody confirm this? 

That's very array dependent and I don't know much about how arrays work.
Even so, I do not think it should cause a lockup; unless there's some
kernel bug that it exposes.

Do they crash (panic), or do they just become totally unresponsive?

Have you tried getting a stack trace from the console using sysrq? (echo
1 > /proc/sys/kernel/sysrq;  then hit alt-sysrq-t from the console).

One thing that's peculiar is that - if they are locking up, they have to
be locking up at about the same time -- otherwise, one would fence the
other, and life would go on.

-- Lon




More information about the Linux-cluster mailing list