[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Can a 2-node Cluster boot-up with only one active node?

On Thu, 2007-10-04 at 13:28 -0300, Celso K. Webber wrote:
> ** What is happening:
>   If I boot node1 with node2 powered off, it stops for 5 minutes during the
> start of ccsd, and after that it regains quorum, qdiskd starts successfully,
> but fenced keeps trying to start for 2 minutes and then it gives up with
> a "failed" message.


> Oct  4 11:56:45 hercules01 lock_gulmd: no <gulm> section detected
> in /etc/cluster/cluster.conf succeeded

chkconfig --del lock_gulmd

> Oct  4 11:56:45 hercules01 qdiskd: Starting the Quorum Disk Daemon: succeeded
> Oct  4 11:57:02 hercules01 kernel: CMAN: quorum regained, resuming activity
> Oct  4 11:57:02 hercules01 ccsd[9144]: Cluster is quorate.  Allowing
> connections.
> Oct  4 11:58:45 hercules01 fenced: startup failed
>        ^^^^^^^^
>        exactly 2 minutes after the qdiskd message above, I've noticed that
> fenced is started in the init scripts with "fence_tool -t 120 join -w"
> Oct  4 11:59:38 hercules01 rgmanager: clurgmgrd startup failed
>        ^^^^^^^^
>        after other service boot up ok, rgmanager fails to boot, probably
> because fenced failed to start
> Oct  4 11:56:45 hercules01 qdiskd[9292]: <info> Quorum Daemon Initializing
> Oct  4 11:56:55 hercules01 qdiskd[9292]: <info> Initial score 1/1
> Oct  4 11:56:55 hercules01 qdiskd[9292]: <info> Initialization complete
> Oct  4 11:56:55 hercules01 qdiskd[9292]: <notice> Score sufficient for
> master operation (1/1; required=1); upgrading
> Oct  4 11:57:01 hercules01 qdiskd[9292]: <info> Assuming master role

[ at this point, the cluster is quorate ]

> ** Installed cluster package versions (same on both nodes):
> cman-kernel-smp-2.6.9-45.15.x86_64.rpm

>From cvs logs for cnxman.c (I know, too much information... but I can't
find a bugzilla on it):


date: 2007/01/19 10:23:14;  author: pcaulfield;  state: Exp;  lines: +2
Tell SM when the quorum device comes or goes.

There's a bug in the one you have which is fixed in 4.5.  Basically, the
SM component of CMAN in the kernel wasn't getting notified when qdisk
votes were causing a quorum transition.  This caused problems with
fenced and the DLM (and thus, rgmanager - since rgmanager uses the DLM).

It's fixed in cman-kernel from 4.5 and later.  The patch is here:


-- Lon

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]