[Linux-cluster] Can a 2-node Cluster boot-up with only one active node?

Lon Hohberger lhh at redhat.com
Thu Oct 4 17:50:09 UTC 2007


On Thu, 2007-10-04 at 13:28 -0300, Celso K. Webber wrote:
> ** What is happening:
>   If I boot node1 with node2 powered off, it stops for 5 minutes during the
> start of ccsd, and after that it regains quorum, qdiskd starts successfully,
> but fenced keeps trying to start for 2 minutes and then it gives up with
> a "failed" message.

Hmmm...

> Oct  4 11:56:45 hercules01 lock_gulmd: no <gulm> section detected
> in /etc/cluster/cluster.conf succeeded

chkconfig --del lock_gulmd

> Oct  4 11:56:45 hercules01 qdiskd: Starting the Quorum Disk Daemon: succeeded
> Oct  4 11:57:02 hercules01 kernel: CMAN: quorum regained, resuming activity
> Oct  4 11:57:02 hercules01 ccsd[9144]: Cluster is quorate.  Allowing
> connections.
> Oct  4 11:58:45 hercules01 fenced: startup failed
>        ^^^^^^^^
>        exactly 2 minutes after the qdiskd message above, I've noticed that
> fenced is started in the init scripts with "fence_tool -t 120 join -w"
> Oct  4 11:59:38 hercules01 rgmanager: clurgmgrd startup failed
>        ^^^^^^^^
>        after other service boot up ok, rgmanager fails to boot, probably
> because fenced failed to start
> Oct  4 11:56:45 hercules01 qdiskd[9292]: <info> Quorum Daemon Initializing
> Oct  4 11:56:55 hercules01 qdiskd[9292]: <info> Initial score 1/1
> Oct  4 11:56:55 hercules01 qdiskd[9292]: <info> Initialization complete
> Oct  4 11:56:55 hercules01 qdiskd[9292]: <notice> Score sufficient for
> master operation (1/1; required=1); upgrading
> Oct  4 11:57:01 hercules01 qdiskd[9292]: <info> Assuming master role

[ at this point, the cluster is quorate ]


> ** Installed cluster package versions (same on both nodes):
> cman-kernel-smp-2.6.9-45.15.x86_64.rpm

>From cvs logs for cnxman.c (I know, too much information... but I can't
find a bugzilla on it):

RHEL45: 1.42.2.28.0.2
cman-kernel_2_6_9_48: 1.42.2.27
...
cman-kernel_2_6_9_45: 1.42.2.25

revision 1.42.2.27
date: 2007/01/19 10:23:14;  author: pcaulfield;  state: Exp;  lines: +2
-0
Tell SM when the quorum device comes or goes.


There's a bug in the one you have which is fixed in 4.5.  Basically, the
SM component of CMAN in the kernel wasn't getting notified when qdisk
votes were causing a quorum transition.  This caused problems with
fenced and the DLM (and thus, rgmanager - since rgmanager uses the DLM).

It's fixed in cman-kernel from 4.5 and later.  The patch is here:

http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/cman-kernel/src/Attic/cnxman.c.diff?r1=1.42.2.26&r2=1.42.2.27&cvsroot=cluster&hideattic=0&only_with_tag=RHEL4

-- Lon





More information about the Linux-cluster mailing list