[Linux-cluster] Fencing agents

Adam Manthei amanthei at redhat.com
Thu Aug 4 16:29:09 UTC 2005


On Thu, Aug 04, 2005 at 12:32:13PM +0400, Denis Medvedev wrote:
> >>What does "automatic" fencing have to offer that the manual fencing lacks.
> >>If we decide to buy the FC switch right away is it recomended that we 
> >>buy one of the ones that have fencing agent available for the 
> >>Cluster-Suite ?
> >>   
> >>
> >
> >In this case, you already have a fencing agent (fence_drac) that works with
> >the PE 1855 blades so there is no need for further fencing hardware (unless
> >you are going to be connecting other machines to the cluster that aren't
> >going to have any other form of fencing)
> >
> >The main advantage that "automatic" fencing gives you over manual fencing 
> >is
> >that in the event that a fencing operation is required, your cluster can
> >automatically recover (on the order of seconds to minutes) instead of 
> >waiting
> >for user intervention (which can take minutes to hours to days depending on
> >how attentive the admins are :).    
> >
> > 
> >
> "recover"? You mean reboot? 

In order for the filesystem to recover, an expired node must first be
fenced.  In this case, since DRAC is being used, it means that the node is
probably rebooted.

> But if a machine need fencing, doesn't that 
> mean that something is inherently wrong with that machine and simple 
> reboot would't cure that?

Perhaps.  Otherwise it might be as simple as a network hiccup that causes
a node to miss enough heartbeats that result in a node getting fenced.  If
you want to leave the node in a state to debug it, then use a SAN based
fencing setup, thus isolating the node for the cluster and keeping it's
state intact for the admin to look at later... maybe (if the machine locks
up too hard, you won't be able to get into it anyway).  If you want to
automate the recovery process, but still make sure that nodes that got
fenced aren't automatically reintegrated into the cluster, you can use a
power based fencing agent that just turns the machine off and doesn't
attempt to power it back it on again.  

-- 
Adam Manthei  <amanthei at redhat.com>




More information about the Linux-cluster mailing list