[Linux-cluster] Repeated fencing

yvette hirth yvette at dbtgroup.com
Wed Feb 24 18:45:41 UTC 2010


Doug Tucker wrote:

> Thanks to you and Carlos.  I understand a bit better now what you are
> referring to, however, I don't believe that is the issue.  The reason we
> went to the crossover cable was to avoid this issue, as we had a switch
> die once, and both then thought they were master and tried to fence the
> other.  In my situation, there is no reason for the missed heartbeat
> that I can find.  The interfaces have not gone down.  We ran a test
> where I started a ping between the 2 that wrote out to a file until a
> "heartbeat" missed and a reboot occurred.  There was not a single missed
> ping between the 2 nodes prior to the event.  Also in a split brain,
> both machines should recognize the other one "gone" and try to become
> master.  In this case, only 1 of the nodes at a time is seeing a "missed
> heartbeat" and then attempting to fence the other.  We have replaced all
> hardware to include cables even to ensure it wasn't that.  This appears
> to be some software bug of sorts.  Again, we have another 2 node cluster
> that this doesn't occur on, but, they are running a different kernel and
> gfs module.

ping is udp.  is the heartbeat udp or tcp?

perhaps you could ensure both servers have their clocks sync'ed and then 
run wireshark on each server capturing the crossover cable ethernet port 
and see which one is failing to signal the other...

hth
yvette hirth




More information about the Linux-cluster mailing list