Re: [Linux-cluster] Re: Starting up two of three nodes that compose a cluster

David Teigland wrote:
On Fri, Sep 21, 2007 at 06:36:04PM +0200, carlopmart wrote:
David Teigland wrote:
On Fri, Sep 21, 2007 at 06:15:37PM +0200, carlopmart wrote:
[root thranduil ~]# fence_ack_manual -n elrond.hpulabs.org

Warning:  If the node "elrond.hpulabs.org" has not been manually fenced
(i.e. power cycled or disconnected from shared storage devices)
the GFS file system may become corrupted and all its data
unrecoverable!  Please verify that the node shown above has
been reset or disconnected from storage.

Are you certain you want to continue? [yN] y
can't open /tmp/fence_manual.fifo: No such file or directory
That looks like the old RHEL4/cluster-1.0 version of fence_ack_manual...
And has some solution???
You need to make sure the RHEL4/cluster-1.0 binaries are removed from the
nodes and the new RHEL5/cluster-2.0/openais binaries are installed.  If
you're getting this far, it may only be some fencing binaries that are
incorrect, so first just remove fence_manual and fence_ack_manual and make
sure you have the new fence_ack_manual installed (it's now a bash script).
fence_manual no longer exists in RHEL5/cluster-2.0 code since
fence_ack_manual talks directly with fenced.


Sorry??? this three nodes are RHEL5 with lastest patches applied except kernel version 2.6.18-8.1.10.

Version of cman is: cman-2.0.64-1.0.1.el5
Version of gfs-utils:
Version of rgmanager: rgmanager-2.0.24-1.el5

 And fence-manual exists on this cluster suite:

[root haldir xen]# whereis fence_manual
fence_manual: /sbin/fence_manual /usr/share/man/man8/fence_manual.8.gz
[root haldir xen]# rpm -qf /sbin/fence_manual
[root smeagol xen]#

And fence_ack_manual it is not a bash script, it is a binary:

[root haldir xen]# whereis fence_ack_manual
fence_ack_manual: /sbin/fence_ack_manual /usr/share/man/man8/fence_ack_manual.8.gz
[root haldir xen]# cd /sbin
[root haldir sbin]# file fence_ack_manual
fence_ack_manual: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped
[root haldir sbin]#

Do I need to install rhel5.1 beta to do this?? If it yes i have a very very great problem ....

Looks like I was wrong about what got into RHEL5, it's a real pity the new
stuff didn't make it.  Looking back at your cluster.conf file it seems
that you're using fence_gnbd for that node, so my next guess is that
fence_gnbd isn't found or isn't working.

I can't find a way to override a failing fence operation in the RHEL5
code, so that probably means you'll have to get fence_gnbd working.

Or, another somewhat dangerous option is to disable startup fencing
altogether by adding this to cluster.conf:
  <fence_daemon clean_start="1"/>


Thanks Dave, but I have tried clean_start without luck ... Error is the same. Fence_gnd works ok, almost when three nodes are up. (deagol.hpulabs.org is a VMWare virtual machine allocated on a ESX cluster).

Well I will try to do a cron job to change cluster.conf at 00:00 AM on Monday ... I think that this is the only option ....

CL Martinez
carlopmart {at} gmail {d0t} com

