Re: R: [Linux-cluster] "Missed too many heartbeats" messages and hung cluster

Leandro Dardini ha scritto:

If something happens between the two machine, they fence each other.

I have configured manual fencing but as I wrote it's not much useful since, I think, requires manual handling which couldn't be possible immediately. Therefore I am looking for a method to let the services run even if such a thing happens. This is not the first time the problem arises, apparently without a reason, though the last time happened long time ago.

You can try to "ping" each other and see, when the problem arise, the connectivity state.

Sometimes the machines are completely locked and it's not even possible to log in. A brute force switch off is necessary in this case. Sometimes looks like only the cluster service is locked and I can regularly ping the other machine though the cluster is not working.

Maybe a "too much intelligent switch" is handling the traffic and have some sort of "traffic shaping and control".

There is nothing like that, the two machines are connected by a 1GB crossover cable, not even so long, provided by HP with the two machines.

