Re: [Linux-cluster] Error messages during Fence operation

Thanks. That makes sense and I hadn't thought of that. I don't see any other connections. However, it appears to have properly fenced one of the nodes last night and I don't believe I've changed anything in the config. Maybe I did have another connection and something I did cleared it without me realizing it. As long as it's working. :)

I'm still pretty "green" when it comes to clustering and SANS and sincerely appreciate the quality responses and willingness to help on this list.


I forgot....I'm using Centos 5 with latest patches and kernel.

I am using an APC Masterswitch Plus as my fencing device. I am seeing this in my logs now when fencing occurs:

Dec 31 11:36:26 nfs1-cluster fenced[3848]: agent "fence_apc" reports: Traceback (most recent call last): File "/sbin/fence_apc", line 829, in ? main() File "/sbin/fence_apc", line 289, in main do_login(sock) File "/sbin/fence_apc", line 444, in do_login i, mo, txt = sock.expect(regex_list, TELNET_TIMEOUT) Dec 31 11:36:26 nfs1-cluster fenced[3848]: agent "fence_apc" reports: File "/usr/lib/python2.4/telnetlib.py", line 620, in expect text = self.read_very_lazy() File "/usr/lib/python2.4/telnetlib.py", line 400, in read_very_lazy raise EOFError, 'telnet connection closed' EOFError: telnet connection closed Dec 31 11:36:26 nfs1-cluster fenced[3848]: fence "nfs2-cluster.nws.noaa.gov" failed

This used to work just fine. If I run `fence_apc -a -l cluster -n 1:7 -o Reboot -p <my password>` from the command line, fencing works as expected. The relevant lines from my cluster.conf file are below. I will gladly provide more information as necessary.

Is it possible that you are already telnet'ed into the switch from a terminal or somesuch when the fence attempt takes place? APC switches allow only one login at a time. I should/will add a log comment that mentions this as a possible reason.

If this is not the issue, well, we can keep digging...


