[Linux-cluster] Re: occasional cluster crashes

Hi Lon,

Lon Hohberger ha scritto:

Do they crash (panic), or do they just become totally unresponsive?

One server suddenly becomes unresponsive, like frozen. The second server starts to miss heartbeats from the first. At the moment I have configured manual fencing so the service is not relocated (more explained below). If I remember good restarting the locked machine is not enough, I have to reboot the working one too.

Have you tried getting a stack trace from the console using sysrq? (echo
1 > /proc/sys/kernel/sysrq;  then hit alt-sysrq-t from the console).

No I haven't, I will try this thing too.

One thing that's peculiar is that - if they are locking up, they have to
be locking up at about the same time -- otherwise, one would fence the
other, and life would go on.

As I wrote only one gets locked. The fencing configuration is another problem to me and something I am aware of. I haven't understood very well how it works, looks like I need an external device which manages power. In this case which device and consequently fencing method is more suitable? I am rather confused about this argument.


