I've got a cluster on CentOS 5.5, cman-2.0.115-34.el5_5.4, using VMWare.
VMWare fencing used to work. At some point, it stopped, and I'm not
sure exactly when. I noticed when one of the nodes developed a
problem and dropped out of the cluster and another node tried and
failed to fence it, and I had to reboot it manually to get it to
rejoin the cluster.
When trying to troubleshoot, I get this:
$ sudo fence_vmware_ng -a virtualcenter -l [loginname] -p [password] -o status -n [fencedevice-name]
Traceback (most recent call last):
File "/sbin/fence_vmware_ng", line 304, in ?
File "/sbin/fence_vmware_ng", line 301, in main
fence_action(None, options, set_power_status, get_power_status)
File "/usr/lib/fence/fencing.py", line 726, in fence_action
status = get_power_fn(tn, options)
File "/sbin/fence_vmware_ng", line 193, in get_power_status
File "/sbin/fence_vmware_ng", line 145, in vmware_get_outlets_vi
File "/sbin/fence_vmware_ng", line 124, in vmware_run_command
NameError: global name 'SHELL_TIMEOUT' is not defined
Same thing if I run $ sudo fence_node [nodename]
Google gives me no hits on this message, and I'd never encountered it
before. I'm trying to find documentation on where SHELL_TIMEOUT and
LOGIN_TIMEOUT are supposed to be defined, and how they're supposed to
be passed to fence_vmware_ng. Anyone know what might've gone wrong,
and what the right fix is?