[Linux-cluster] SNMP support with IBM Blade Center Fence Agent

Parvez Shaikh parvez.h.shaikh at gmail.com
Fri Mar 4 17:45:07 UTC 2011


Hi Lon,

Thank you for reply.

What I gathered from your response is to remove manual fencing at once. This
will cause fence daemon to retry fence_bladecenter until the node is fenced.
More likely the fenced will succeed in fencing the failed node(provided IP,
user name and password for bladecenter management module are right); even if
it times out for the first time. Am I right?

I will try removing manual fencing and see how things go.


>> If fencing is failing (permanently), you can still run:
>>   fence_ack_manual -e -n <nodename>

By the way as per my understanding fence_ack_manual -n <node name> can be
executed to acknowledge only manually fenced node(and not bladecenter fenced
node), correct me if this understanding is wrong. So God forbid, if
fence_bladecenter fails for some reason; we still have option to run
fence_manual and then fence_ack_manual, so cluster is back to working.

Thanks again and have great weekend ahead

Yours truly,
Parvez

On Fri, Mar 4, 2011 at 10:45 PM, Lon Hohberger <lhh at redhat.com> wrote:

> On Tue, Mar 01, 2011 at 06:50:18PM +0530, Parvez Shaikh wrote:
> > Hi Ryan,
> >
> > Thank you for response. Does it mean there is no way to intimate
> > administrator about failure of fencing as of now?
> >
> > Let me give more information about my cluster -
> >
> > I have set of nodes in cluster with only IP resource being protected. I
> have
> > two levels of fencing, first bladecenter fencing and second one is manual
> > fencing.
>
> If the problem you have with fence_bladecenter is intermittent - for
> example, if it fails 1/2 the time, fence_manual is going to *detract*
> from your cluster's ability to recover automatically.
>
> Ordinarily, if a fencing action fails, fenced will automatically retry
> the operation.
>
> When you configure fence_manual as a backup, this retry will *never*
> occur, meaning your cluster hangs.
>
>
> > At times if machine is already down(either power failure or turned off
> > abrupty); blade center fencing timesout and manual fencing happens. At
> this
> > time, administrator is expected to run fence_ack_manual.
>
> > Clearly this is not something which is desirable, as downtime of services
> is
> > as long as administrator runs fence_ack_manual.
>
> > What is recommended method to deal with  blade center fencing failure in
> > this situation? Do I have to add another level of fencing(between blade
> > center and manual) which can fence automatically(not requiring manual
> > interference)?
>
> Start with removing fence_manual.
>
> If fencing is failing (permanently), you can still run:
>
>   fence_ack_manual -e -n <nodename>
>
> > > > my bladecenter fencing agent, I sometimes get message saying
> bladecenter
> > > > fencing failed because of timeout or fence device IP address/user
> > > > credentials are incorrect.
>
> ^^ This is why I think fence_manual is, in your specific case, very
> likely hurting your availability.
>
> --
> Lon Hohberger - Red Hat, Inc.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110304/3b2acb1b/attachment.htm>


More information about the Linux-cluster mailing list