[Linux-cluster] occasional cluster crashes
Lon Hohberger
lhh at redhat.com
Wed Nov 15 16:18:41 UTC 2006
On Tue, 2006-11-14 at 16:44 +0100, Fabrizio Lippolis wrote:
> The cluster is running MySQL, one of the machines runs the MySQL process
> at a time while the database files are on the disk array. I checked that
> if I kill the process, it will migrate on the second machine. From time
> to time I experience occasional lockups of one of the two machines, it
> doesn't happen very often and apparently without reason. The only
> solution in this case is to brutally switch off the machine and reboot.
> The problem started to be much more frequent when I tried to add another
> service to the cluster, a LDAP directory. The crashes happened sometimes
> more than once a day.
:o
The only problems I'm aware of related to cluster service counts are
performance related (rgmanager used to slow down a lot with more
services), and only on pre-U4 version.
> I already wrote about this problem some time ago and somebody answered
> that it could be caused because of the connection of the nodes to the
> disk array. When a node is accessing the disk array the SCSI bus will
> prevent the other node from doing something. Can anybody confirm this?
That's very array dependent and I don't know much about how arrays work.
Even so, I do not think it should cause a lockup; unless there's some
kernel bug that it exposes.
Do they crash (panic), or do they just become totally unresponsive?
Have you tried getting a stack trace from the console using sysrq? (echo
1 > /proc/sys/kernel/sysrq; then hit alt-sysrq-t from the console).
One thing that's peculiar is that - if they are locking up, they have to
be locking up at about the same time -- otherwise, one would fence the
other, and life would go on.
-- Lon
More information about the Linux-cluster
mailing list