[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] Fencing Device Question



In my GFS cluster, I use DRAC cards as the fencing device for each node.  Yesterday, I had a situation where the DRAC card on a particular node had failed, and would not allow remote logins, etc, but it still returned pings.  I don't know how long the card had been dead, and I only noticed because I wished to manually fence the node and fencing failed ... which caused me all sorts of other fun to recover the cluster, afterwards.  So, I have uncovered a pretty scary bad-case scenario for my cluster configuration.

My question is what (if anything) can RHCS/GFS do to determine the health/presence/operation of fencing devices?  If it can do something to monitor the fencing devices, and discovers a bad fencing device, what will it do?  For example, if I unplug the network cable for the heartbeat, the node will get fenced immediately.  I never tested whether the same would happen if I unplugged a fencing device.  I haven't delved into the documentation in a while, but I don't remember anything about a way to have redundant fencing devices, like a DRAC and a network power switch.  Is there a way?

Thoughts, opinions, insight, documentation, etc would be greatly appreciated.

--
Brandon

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]