[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Removing a node from a running cluster

Next time, run "cman_tool leave" it has a few pre-req's so check the man page.
Then a "cman_tool expected vote_num" should sort out your quorum/votes.


On 1/4/07, Pena, Francisco Javier <francisco_javier pena roche com> wrote:

I am finding a strange cman behavior when removing a node from a running cluster. The starting point is:

- 3 nodes running RHEL 4 U4, GFS 6.1    (1 vote per node)
- Quorum disk                                   (4 votes)

I stop all cluster services on node 3, then modify the cluster.conf file to remove the node (and adjust the quorum disk votes to 3), and then "ccs_tool update" and "cman_tool version -r <new_version>". The cluster services keep running, however it looks like cman is not completely in sync with ccsd:

# ccs_tool lsnode

Cluster name: TestCluster, config_version: 9

Nodename                        Votes Nodeid Iface Fencetype
gfsnode1                           1    1          iLO_NODE1
gfsnode2                           1    2          iLO_NODE2

# cman_tool nodes

Node  Votes Exp Sts  Name
   0    4    0   M   /dev/emcpowera1
   1    1    3   M   gfsnode1
   2    1    3   M   gfsnode2
   3    1    3   X   gfsnode3

# cman_tool status

Protocol version: 5.0.1
Config version: 9
Cluster name: TestCluster
Cluster ID: 62260
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 3
Total_votes: 6
Quorum: 4
Active subsystems: 9
Node name: gfsnode1
Node ID: 1
Node addresses: A.B.C.D

CMAN still thinks the third node is part of the cluster, but has just stopped working. In addition to that, it is not updating the number of votes for the quorum disk. If I completely restart the cluster services on all nodes, I get the right information:

- Correct votes for the quorum disk
- Third node dissappears
- The Expected_votes value is now 2

I know from a previous post that two node clusters are a special case, even with quorum disk, but I am pretty sure the same problem will happen with higher node counts (I just do not have enough hardware to test it).

So, is this considered as a bug or is it expected that the information from removed nodes is still there until the whole cluster is restarted?

Thanks in advance,

Javier Peña

Linux-cluster mailing list
Linux-cluster redhat com

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]