[Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1

Thu Apr 17 08:30:30 UTC 2008

No,

I strongly believe it should not work that way.

To my mind, it should work like this:

- 2 nodes up'n running, everything ok
- shutdown cluster daemons on node b
- node b tells node a "I'm going administrative down", node a is
decreasing cluster votes from 3 --> 2
- node a is happy, no fencing

- start node b's cluster daemons
- joins to cluster normally
- gains quorum device normally, cluster votes back --> 3

Of course it's different if node b fails, but this is not failing, it's
administrative shutdown and node a is informed.

If I halt node b, it's fenced ok by node a, as it should be, it reboots
and joins to cluster normally.

-hjp

On Thu, 2008-04-17 at 09:17 +0100, Gordan Bobic wrote:
> > 2. qdisk doesn't work. 2- node cluster. Start it (both nodes at the
> same time) to get it up. Works ok, qdisk works, heuristic works.
> Everything works. If I stop cluster daemons on one node, that node
> can't join to cluster anymore without a complete reboot. It joins,
> another node says ok, the node itself says ok, quorum is registred and
> heuristic is up, but the node's quorum-disk stays offline and another
> node says this node is offline. If I reboot this machine, it joins to
> cluster ok.
> 
> I believe it's supposed to work that way. When a node fails it needs
> to 
> be fully restarted before it is allowed back into the cluster. I'm
> sure 
> this has been mentioned on the list recently.