[Linux-cluster] SNMP support with IBM Blade Center Fence Agent
Lon Hohberger
lhh at redhat.com
Fri Mar 4 17:15:45 UTC 2011
On Tue, Mar 01, 2011 at 06:50:18PM +0530, Parvez Shaikh wrote:
> Hi Ryan,
>
> Thank you for response. Does it mean there is no way to intimate
> administrator about failure of fencing as of now?
>
> Let me give more information about my cluster -
>
> I have set of nodes in cluster with only IP resource being protected. I have
> two levels of fencing, first bladecenter fencing and second one is manual
> fencing.
If the problem you have with fence_bladecenter is intermittent - for
example, if it fails 1/2 the time, fence_manual is going to *detract*
from your cluster's ability to recover automatically.
Ordinarily, if a fencing action fails, fenced will automatically retry
the operation.
When you configure fence_manual as a backup, this retry will *never*
occur, meaning your cluster hangs.
> At times if machine is already down(either power failure or turned off
> abrupty); blade center fencing timesout and manual fencing happens. At this
> time, administrator is expected to run fence_ack_manual.
> Clearly this is not something which is desirable, as downtime of services is
> as long as administrator runs fence_ack_manual.
> What is recommended method to deal with blade center fencing failure in
> this situation? Do I have to add another level of fencing(between blade
> center and manual) which can fence automatically(not requiring manual
> interference)?
Start with removing fence_manual.
If fencing is failing (permanently), you can still run:
fence_ack_manual -e -n <nodename>
> > > my bladecenter fencing agent, I sometimes get message saying bladecenter
> > > fencing failed because of timeout or fence device IP address/user
> > > credentials are incorrect.
^^ This is why I think fence_manual is, in your specific case, very
likely hurting your availability.
--
Lon Hohberger - Red Hat, Inc.
More information about the Linux-cluster
mailing list