[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] two fencing problems



Greg Forte wrote:

And it still doesn't appear to work ... I can turn the outlets on and off from the command line, but if I down the interface on a node, the other node reports that it's removing the "failed" node from the cluster, and that it's fencing the "failed" node, but the "failed" node never gets shut down. Does this get logged somewhere besides /var/log/messages, or is there a way to force it to be more verbose? If I could see what command fenced is actually invoking that might help ...

Well, in case anyone is interested, I got fed up with having no decent logging from any of these components, so I finally used tcpdump to monitor the telnet connection between the non-failed node and the PDUs as it tried to fence them ... and it turns out that fence_apc was trying to turn each port ON twice, instead of OFF and then ON like it's supposed to according to my configuration. The fault apparently lies somewhere in ccsd or fenced, because the fence_apc script definitely responds properly to the on|off|reboot options, both on the command line and in the stdin like fenced uses.

I changed my cluster.conf so that it uses 'reboot' instead of 'off' and 'on' (e.g. the old conf looked like this:

<device name="FENCE1" option="off" port="1"/> <device name="FENCE2" option="off" port="1"/> <device name="FENCE1" option="on" port="1"/> <device name="FENCE2" option="on" port="1"/>

and the new one looks like this:

<device name="FENCE1" option="reboot" port="1"/> <device name="FENCE2" option="reboot" port="1"/>

and increased the reboot wait time on the PDUs to make sure it'd wait long enough, and that SEEMS to work (once I remembered to turn off ccsd before updating my cluster.conf by hand so that it didn't end up replacing it with the old one immediately ;-)

Of course, I can't bring up any of the per-node fencing configuration items in system-config-cluster anymore, but I think I mentioned that previously - when I set them up through the gui it put "switch=" options in each <device /> tag, and then when I shut down and restarted the gui it complained that the file was formatted improperly. I removed those options by hand, and then the gui worked again, but ever since the fencing info hasn't been available ...

Any developers care to comment on any of this? I'm finding it really tough to believe that this is a supported RedHat "product".

-g

Greg Forte
gforte udel edu
IT - User Services
University of Delaware
302-831-1982
Newark, DE


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]