Re: [Linux-cluster] sudden unfencing problem

You might want to grab a tcpdump of the connection. Perhaps you'll be able to see a bit more of the conversation.

On Sat, Mar 23, 2013 at 5:55 AM, Laurence Schuler <laurence schuler nasa gov> wrote:
I have a two node cluster that has been running fine for a couple of
months (little to 0 reboots though). We recently updated the software
with the latest Centos 6 software but now the cluster will not start. It
keeps throwing errors during startup when attempting to unfence the
disks. I have hard reset the fiber switch, and reset both hosts, but
when I run fence_sanbox2, I am unable to either enable, disable or even
get status of the switch ports. This is the error I get.

> [root web1 lschule3]# /usr/sbin/fence_sanbox2 -a -l
> admin -S FCpass.sh -o enable -n 5 -v
> telnet> set binary
> Negotiating binary mode with remote host.
> telnet> open -23
> Trying
> Connected to
> Escape character is '^]'.
> Firmware V8.
> r3fc1 login:
>   Establishing connection...   Please wait.
>        *****************************************************
>        *                                                   *
>        *       Command Line Interface SHell  (CLISH)       *
>        *                                                   *
>        *****************************************************
>        SystemDescription   SANbox 5800 FC Switch
>        HostName            r3fc1
>        EthIPv4NetworkAddr
>        EthIPv6NetworkAddr  fe80::2c0:00:00:90b
>        MACAddress          00:c0:dd:77:10:0b
>        WorldWideName       10:00:00:c0:dd:24:09:0b
>        SerialNumber        1236H00833
>        SymbolicName        r3fc1
>        ActiveSWVersion     V8.
>        ActiveTimestamp     Mon Apr  2 18:32:33 2012
>        POSTStatus          Passed
>        LicensedPorts       12
>        SwitchMode          Full Fabric
>   The alarm log is empty.
> r3fc1 #> r3fc1 #> Failed: Unable to switch to admin section
> [root web1 lschule3]#

I can manually telnet into the FC switch and execute the appropriate
commands to enable/disable ports. But the fence_sanbox2 script will not.
The fence_sanbox2 code has not changed, however python has been upgraded
from 2.6.6-29 to 2.6.6-36.

Has anyone else seen this? Know of a fix? Am I doing/not doing something
stupid? I seem to recall running this command before during setup and it
worked just fine then.

Thanks for any help!

Laurence Schuler (Larry)                       Laurence Schuler nasa gov
Systems Support                                       ADNET Systems, Inc
Scientific Visualization Studio                 http://svs.gsfc.nasa.gov

 - jim

