[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] SNMP support with IBM Blade Center Fence Agent



Hi Lon,

Thank you for reply.

What I gathered from your response is to remove manual fencing at once. This will cause fence daemon to retry fence_bladecenter until the node is fenced. More likely the fenced will succeed in fencing the failed node(provided IP, user name and password for bladecenter management module are right); even if it times out for the first time. Am I right?

I will try removing manual fencing and see how things go.


>> If fencing is failing (permanently), you can still run:
>>   fence_ack_manual -e -n <nodename>

By the way as per my understanding fence_ack_manual -n <node name> can be executed to acknowledge only manually fenced node(and not bladecenter fenced node), correct me if this understanding is wrong. So God forbid, if fence_bladecenter fails for some reason; we still have option to run fence_manual and then fence_ack_manual, so cluster is back to working.

Thanks again and have great weekend ahead

Yours truly,
Parvez

On Fri, Mar 4, 2011 at 10:45 PM, Lon Hohberger <lhh redhat com> wrote:
On Tue, Mar 01, 2011 at 06:50:18PM +0530, Parvez Shaikh wrote:
> Hi Ryan,
>
> Thank you for response. Does it mean there is no way to intimate
> administrator about failure of fencing as of now?
>
> Let me give more information about my cluster -
>
> I have set of nodes in cluster with only IP resource being protected. I have
> two levels of fencing, first bladecenter fencing and second one is manual
> fencing.

If the problem you have with fence_bladecenter is intermittent - for
example, if it fails 1/2 the time, fence_manual is going to *detract*
from your cluster's ability to recover automatically.

Ordinarily, if a fencing action fails, fenced will automatically retry
the operation.

When you configure fence_manual as a backup, this retry will *never*
occur, meaning your cluster hangs.


> At times if machine is already down(either power failure or turned off
> abrupty); blade center fencing timesout and manual fencing happens. At this
> time, administrator is expected to run fence_ack_manual.

> Clearly this is not something which is desirable, as downtime of services is
> as long as administrator runs fence_ack_manual.

> What is recommended method to deal with  blade center fencing failure in
> this situation? Do I have to add another level of fencing(between blade
> center and manual) which can fence automatically(not requiring manual
> interference)?

Start with removing fence_manual.

If fencing is failing (permanently), you can still run:

  fence_ack_manual -e -n <nodename>

> > > my bladecenter fencing agent, I sometimes get message saying bladecenter
> > > fencing failed because of timeout or fence device IP address/user
> > > credentials are incorrect.

^^ This is why I think fence_manual is, in your specific case, very
likely hurting your availability.

--
Lon Hohberger - Red Hat, Inc.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]