[Linux-cluster] SNMP support with IBM Blade Center Fence Agent

Lon Hohberger lhh at redhat.com
Fri Mar 4 17:15:45 UTC 2011


On Tue, Mar 01, 2011 at 06:50:18PM +0530, Parvez Shaikh wrote:
> Hi Ryan,
> 
> Thank you for response. Does it mean there is no way to intimate
> administrator about failure of fencing as of now?
> 
> Let me give more information about my cluster -
> 
> I have set of nodes in cluster with only IP resource being protected. I have
> two levels of fencing, first bladecenter fencing and second one is manual
> fencing.

If the problem you have with fence_bladecenter is intermittent - for
example, if it fails 1/2 the time, fence_manual is going to *detract*
from your cluster's ability to recover automatically.

Ordinarily, if a fencing action fails, fenced will automatically retry
the operation.

When you configure fence_manual as a backup, this retry will *never*
occur, meaning your cluster hangs.


> At times if machine is already down(either power failure or turned off
> abrupty); blade center fencing timesout and manual fencing happens. At this
> time, administrator is expected to run fence_ack_manual.

> Clearly this is not something which is desirable, as downtime of services is
> as long as administrator runs fence_ack_manual.

> What is recommended method to deal with  blade center fencing failure in
> this situation? Do I have to add another level of fencing(between blade
> center and manual) which can fence automatically(not requiring manual
> interference)?

Start with removing fence_manual.

If fencing is failing (permanently), you can still run:

   fence_ack_manual -e -n <nodename>

> > > my bladecenter fencing agent, I sometimes get message saying bladecenter
> > > fencing failed because of timeout or fence device IP address/user
> > > credentials are incorrect.

^^ This is why I think fence_manual is, in your specific case, very
likely hurting your availability.

-- 
Lon Hohberger - Red Hat, Inc.




More information about the Linux-cluster mailing list