[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] BladeCenter Fencing errors




See below;

Gary Romo
IBM Global Technology Services
303.458.4415
Email: garromo us ibm com
Pager:1.877.552.9264
Text message: gromo skytel com



jim parsons <jparsons redhat com>
Sent by: linux-cluster-bounces redhat com

01/17/2008 03:40 PM

Please respond to
linux clustering <linux-cluster redhat com>

To
linux clustering <linux-cluster redhat com>
cc
linux-cluster-bounces redhat com
Subject
Re: [Linux-cluster] BladeCenter Fencing errors





On Thu, 2008-01-17 at 14:06 -0700, Gary Romo wrote:
>
> I enabled telnet on the MM, now I am getting these messsages;
>
> Jan 17 14:00:24 node1 fenced[3229]: fence "node2" failed
> Jan 17 14:00:29 node1 fenced[3229]: fencing node "node2"
> Jan 17 14:00:40 node1 fenced[3229]: agent "fence_bladecenter" reports:
> pattern match timed-out at /sbin/fence_bladecenter line 189  
>
> Jan 17 14:00:40 node1 fenced[3229]: fence "node2" failed
> Jan 17 14:00:45 node1 fenced[3229]: fencing node "node2"
> Jan 17 14:00:56 node1 fenced[3229]: agent "fence_bladecenter" reports:
> pattern match timed-out at /sbin/fence_bladecenter line 189  
>
> Jan 17 14:00:56 node1 fenced[3229]: fence "node2" failed
> Jan 17 14:01:01 node1 fenced[3229]: fencing node "node2"
> Jan 17 14:01:12 node1 fenced[3229]: agent "fence_bladecenter" reports:
> pattern match timed-out at /sbin/fence_bladecenter line 189  
>
> Line 189 looks like this;
>
>  ($text, $match) = $t->waitfor("/system:blade\\[$bladenum\\]>/");
>
>
> I am getting these on thesecond node;
>
> Jan 17 14:03:24 mode2 fenced[3340]: fence "node1" failed
> Jan 17 14:03:29 node2 fenced[3340]: fencing node "node1"
> Jan 17 14:03:29 node2 fenced[3340]: fence "node1" failed
> Jan 17 14:03:34 node2 fenced[3340]: fencing node "node1"
> Jan 17 14:03:34 node2 fenced[3340]: fence "node1" failed
>
Ah, yuck. Well, let's figure out what is going on here.
Can you post the clusternodes and fencedevices sections of your
cluster.conf here? Just XXXX out any passwords.


<?xml version="1.0"?>
<cluster alias="rhcs-1-clus" config_version="4" name="rhcs-1-clus">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="node1" votes="1">
                        <multicast addr="XXX.XXX.127.204" interface="eth0"/>
                        <fence>
                                <method name="1">
                                        <device blade="2" name="chassis_fence"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node2" votes="1">
                        <multicast addr="XXX.XXX.127.204" interface="eth0"/>
                        <fence>
                                <method name="1">
                                        <device blade="3" name="chassis_fence"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1">
                <multicast addr="XXX.XXX.127.204"/>
        </cman>
        <fencedevices>
                <fencedevice agent="fence_bladecenter" ipaddr="XXX.XXX.1.143" login="rchs_fence" name="chassis_fence" passwd="XXXXXXX"/>
        </fencedevices>

On one of the cluster nodes, can you run
'/sbin/fence_bladecenter -a <ip or hostname of bladecenter> -l <login>
-p <passwd> -n <blade number of another running node> -o status -v'


[root lxdnt648 ~]# /sbin/fence_bladecenter -a chassis -l rchs_fence -p XXXXXXX -n 2 -o status -v
Please use '-h' for usage.

Do you know firmware details about your bladecenter? The
fence_bladecenter script hasn't changed in years...The tested firmware
versions are in the top of the file. Maybe the interface has changed. If
so, the debuglog should give us information.


 1     chassis     Main application     BRET85M     CNETMNUS.PKT     01-10-07  
16
              Boot ROM*     BRBR82A     CNETBRUS.PKT     06-01-05  
16
              Remote control     BRRG85M     CNETRGUS.PKT     01-10-07  
16



This will get us started.

-Jim

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]