[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] two fencing problems



Eric Kerin wrote:
> Greg,
>
> I'm using the fence_apc agent on my cluster with APC 7900s, and fencing
> is working perfectly for me, and has for more than 6 months now.

Thanks, Eric, but the fence_apc script is definitely not the issue - I had to make a couple of minor changes to fence_apc's regexps, and it now works both with command-line options and passing arguments through stdin. This doesn't explain why the cluster conf doesn't work when it has "off" and then "on" as set up by system-config-cluster (and it did that itself, all I did was configure the ip address and login for the fence devices, and tell it which ports to use), but it does work when I make the change to 'reboot' as described in my previous message (this is the default option, anyway, which I assume is why yours works with no "option=" option).

You can test that the cluster is configured correctly to fence a node by
running "fence_node <nodename>"  This will use the cluster's config file
to fence the node, ensuring that all config settings are correct.

Actually, that doesn't seem to work for me - no matter what nodename I specify, and regardless of whether I run it on the node I'm trying to fence or the other node (it's a two-node cluster), it comes back with "Fence of 'hostname' was unsuccessful." I suspect this is because it's a two-node cluster so fenced doesn't want to let me kick out a node that's still active ... or maybe it's a just host name problem. Regardless, it _does_ work correctly if I simulate a real failure, after I made the aforementioned cluster.conf change, so I'm confident that I've got it configured correctly. My gripe is that (a) the gui tool can't seem to generate even the most simple conf correctly, and (b) there's apparently a bug in fenced where it passes an "option=on" to the fence_apc agent, when it clearly should be "option = off". Or else ccsd is misparsing the cluster.conf file. I don't see how else to explain that the conf file said "off", then "on", but the daemon did "on", "on".

When updating the cluster.conf file by hand, you are updating the
config_version attribute of the cluster node, right?  I do updates to my
cluster.conf file by hand pretty much exclusively, while the cluster is
running, and with no problems whatsoever.  Changes propagate as expected
after running "ccs_tool update <cluster.conf filename>"and "cman_tool
version -r <new_version_number>"

Hmmm ... nope, but I will do so in the future.  ;-)  Thanks.

-g

Greg Forte
gforte udel edu
IT - User Services
University of Delaware
302-831-1982
Newark, DE


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]