[Linux-cluster] SNMP support with IBM Blade Center Fence Agent

Tue Mar 1 13:20:18 UTC 2011

Hi Ryan,

Thank you for response. Does it mean there is no way to intimate
administrator about failure of fencing as of now?

Let me give more information about my cluster -

I have set of nodes in cluster with only IP resource being protected. I have
two levels of fencing, first bladecenter fencing and second one is manual
fencing.

At times if machine is already down(either power failure or turned off
abrupty); blade center fencing timesout and manual fencing happens. At this
time, administrator is expected to run fence_ack_manual.

Clearly this is not something which is desirable, as downtime of services is
as long as administrator runs fence_ack_manual.

What is recommended method to deal with  blade center fencing failure in
this situation? Do I have to add another level of fencing(between blade
center and manual) which can fence automatically(not requiring manual
interference)?

Thanks

On Mon, Feb 28, 2011 at 9:44 PM, Ryan O'Hara <rohara at redhat.com> wrote:

> On Mon, Feb 28, 2011 at 12:43:10PM +0530, Parvez Shaikh wrote:
> > Hi all,
> >
> > I have a question related to fence agents and SNMP alarms.
> >
> > Fence Agent can fail to fence the failed node for various reason; e.g.
> with
> > my bladecenter fencing agent, I sometimes get message saying bladecenter
> > fencing failed because of timeout or fence device IP address/user
> > credentials are incorrect.
> >
> > In such a situation is it possible to generate SNMP trap?
>
> This feature will be in RHEL6.1. There is a new project called
> 'foghorn' that creates SNMPv2 traps from dbus signals.
>
> git://git.fedorahosted.org/foghorn.git
>
> In RHEL6.1 (and the latest upstream release), certain cluster
> components will emit dbus signals when certain events occurs. This
> includes fencing. So when a node is fenced a dbus signal is generated
> by fenced. The foghorn service catches this signal and generated
> SNMPv2 trap.
>
> Note that foghorn runs as an AgentX subagent, so snmpd must be running
> as the master agentx.
>
> Ryan
>
> > My cluster config file looks like below and in my case if bladecenter
> > fencing fails, manual fencing kicks in and requires user to do
> > fence_ack_manual, for this user must at least be notified via SNMP (or
> any
> > other mechanism?) to intervene  -
> >
> >   <clusternodes>
> >     <clusternode name="blade2" nodeid="2" votes="1">
> >       <fence>
> >         <method name="1">
> >           <device blade="2" name="BladeCenterFencing"/>
> >         </method>
> >         <method name="2">
> >           <device name="ManualFencing" nodename="blade2"/>
> >         </method>
> >       </fence>
> >     </clusternode>
> >     <clusternode name="blade1" nodeid="1" votes="1">
> >       <fence>
> >         <method name="1">
> >           <device blade="1" name="BladeCenterFencing"/>
> >         </method>
> >         <method name="2">
> >           <device name="ManualFencing" nodename="blade1"/>
> >         </method>
> >       </fence>
> >     </clusternode>
> >   </clusternodes>
> >   <cman expected_votes="1" two_node="1"/>
> >   <fencedevices>
> >     <fencedevice agent="fence_bladecenter" ipaddr="blade-mm.com"
> > login="USERID" name="BladeCenterFencing" passwd="PASSW0RD"/>
> >     <fencedevice agent="fence_manual" name="ManualFencing"/>
> >   </fencedevices>
> >
> > Thanks,
> > Parvez
>
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110301/73002998/attachment.htm>