[Linux-cluster] "Missed too many heartbeats" messages and hung cluster

Fabrizio Lippolis Fabrizio.Lippolis at AurigaInformatica.it
Tue Jun 27 13:35:35 UTC 2006


Patrick Caulfield ha scritto:

>> Jun 23 23:37:17 AICLSRV02 kernel: CMAN: removing node AICLSRV01 from the
>> cluster : Missed too many heartbeats
> 
> 
> That message means that the heartbeat messages are getting lost somehow.
> either through an unreliable network link or something else odd happening on
> the machine to prevent the heartbeat packets reaching the network.

This is very strange since the two machines are connected by a gigabit 
crossover cable and no other device is in the middle. Also, no firewall 
rules are configured on any machine.

By the way, actually I am using the fence manual method but it isn't 
much helpful and I would like to switch to a method that ensures a 
reliable service. Does it mean I have to buy a device sitting in the 
middle of the machines that connects network and power cables? I am 
rather new to it so please any suggestion is welcome.

-- 
Fabrizio Lippolis                fabrizio.lippolis at aurigainformatica.it
Auriga Informatica s.r.l.            Via Don Guanella 15/B - 70124 Bari
Tel.: 080/5025414 - Fax: 080/5027448 - http://www.aurigainformatica.it/




More information about the Linux-cluster mailing list