[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] Necessary a delay to restart cman?



Hi. I have a CentOS 5.3 cluster with two nodes. If I execute service cman restart within a node, or stop + start after few seconds, another node doesn´t recognize this membership return and its fellow stay forever offline.

For example:

* Before cman restart:

node1# cman_tool status
Version: 6.1.0
Config Version: 6
Cluster Name: CSVirtualizacion
Cluster Id: 42648
Cluster Member: Yes
Cluster Generation: 202600
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Quorum: 1
Active subsystems: 7
Flags: 2node Dirty
Ports Bound: 0
Node name: patty
Node ID: 1
Multicast addresses: 224.0.0.133
Node addresses: 138.100.8.70

* After cman stop for node2 (and before a number seconds < token parameter)

node1# cman_tool status
Version: 6.1.0
Config Version: 6
Cluster Name: CSVirtualizacion
Cluster Id: 42648
Cluster Member: Yes
Cluster Generation: 202600
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node Dirty
Ports Bound: 0
Node name: patty
Node ID: 1
Multicast addresses: 224.0.0.133
Node addresses: 138.100.8.70
Wed May  6 12:29:38 CEST 2009

* After cman stop for node2 (and after a number seconds > token parameter)

node1# date; cman_tool status
Version: 6.1.0
Config Version: 6
Cluster Name: CSVirtualizacion
Cluster Id: 42648
Cluster Member: Yes
Cluster Generation: 202604
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node Dirty
Ports Bound: 0
Node name: patty
Node ID: 1
Multicast addresses: 224.0.0.133
Node addresses: 138.100.8.70
Wed May  6 12:29:47 CEST 2009

/var/log/messages:
May 6 12:35:20 node2 openais[17262]: [TOTEM] The token was lost in the OPERATIONAL state. May 6 12:35:20 node2 openais[17262]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes). May 6 12:35:20 node2 openais[17262]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
May  6 12:35:20 node2 openais[17262]: [TOTEM] entering GATHER state from 2.
May  6 12:35:25 node2 openais[17262]: [TOTEM] entering GATHER state from 0.
May 6 12:35:25 node2 openais[17262]: [TOTEM] Creating commit token because I am the rep. May 6 12:35:25 node2 openais[17262]: [TOTEM] Saving state aru 26 high seq received 26 May 6 12:35:25 node2 openais[17262]: [TOTEM] Storing new sequence id for ring 31780
May  6 12:35:25 node2 openais[17262]: [TOTEM] entering COMMIT state.
May  6 12:35:25 node2 openais[17262]: [TOTEM] entering RECOVERY state.
May 6 12:35:25 node2 openais[17262]: [TOTEM] position [0] member 10.10.8.70: May 6 12:35:25 node2 openais[17262]: [TOTEM] previous ring seq 202620 rep 10.10.8.70 May 6 12:35:25 node2 openais[17262]: [TOTEM] aru 26 high delivered 26 received flag 1 May 6 12:35:25 node2 openais[17262]: [TOTEM] Did not need to originate any messages in recovery.
May  6 12:35:25 node2 openais[17262]: [TOTEM] Sending initial ORF token
May  6 12:35:25 node2 openais[17262]: [CLM  ] CLM CONFIGURATION CHANGE
May  6 12:35:25 node2 openais[17262]: [CLM  ] New Configuration:
May  6 12:35:25 node2 openais[17262]: [CLM  ]   r(0) ip(10.10.8.70)
May  6 12:35:25 node2 openais[17262]: [CLM  ] Members Left:
May  6 12:35:25 node2 openais[17262]: [CLM  ]   r(0) ip(10.10.8.71)
May  6 12:35:25 node2 openais[17262]: [CLM  ] Members Joined:
May  6 12:35:25 node2 openais[17262]: [CLM  ] CLM CONFIGURATION CHANGE
May  6 12:35:25 node2 openais[17262]: [CLM  ] New Configuration:
May  6 12:35:25 node2 openais[17262]: [CLM  ]   r(0) ip(10.10.8.70)
May  6 12:35:25 node2 openais[17262]: [CLM  ] Members Left:
May  6 12:35:25 node2 openais[17262]: [CLM  ] Members Joined:
May 6 12:35:25 node2 openais[17262]: [SYNC ] This node is within the primary component and will provide service.
May  6 12:35:25 node2 openais[17262]: [TOTEM] entering OPERATIONAL state.
May  6 12:35:25 node2 kernel: dlm: closing connection to node 2
May 6 12:35:25 node2 openais[17262]: [CLM ] got nodejoin message 10.10.8.70 May 6 12:35:25 node2 openais[17262]: [CPG ] got joinlist message from node 1


if node2 doesn`t wait for run cman start to the detection the operational token's lost, node1 detect node2 like offline forever. Following attempts for cman restarts don`t change this state:
node1# cman_tool nodes
Node  Sts   Inc   Joined               Name
  1   M  202616   2009-05-06 12:34:43  node1
  2   X  202628                        node2
node2# cman_tool nodes
Node  Sts   Inc   Joined               Name
  1   M  202644   2009-05-06 12:51:04  node1
  2   M  202640   2009-05-06 12:51:04  node2


Is it necessary a delay for cman stop + start to avoid this inconsistent state or really is it a bug?

Regards.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]