[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Rhel 5.7 Cluster - gfs2 volume in "LEAVE_START_WAIT" status



Hi Cedric,

About the only doc I've found that describes the barrier state transitions is in the cluster2 architecture doc

http://people.redhat.com/teigland/cluster2-arch.txt

When group membership changes, there's a barrier operation that stops the group, changes the membership, and restarts the group, so that all members agree on the membership change synchronization.  LEAVE_START_WAIT means that a node (12) left the group, but restarting the group hasn't completed because not all the nodes have acknowledged agreement.  You should do 'group_tool -v' on the different nodes of the cluster and look for a node where the final 'local_done' flag is 0, or where the group membership is inconsistent with the other nodes.  Dumping the debug buffer for the group on the various nodes may also identify which node is being waited on.  In the cases where we've found inconsistent group membership, fencing the node with the inconsistency let the group finish starting.

[as an aside--is there a plan to reengineer the RH cluster group membership protocol stack to take advantage of the virtual synchrony capabilities of Corosync/TOTEM?]

-dan

On Jun 2, 2012, at 9:25 PM, Cedric Kimaru wrote:

> Fellow Cluster Compatriots,
> I'm looking for some guidance here. Whenever my rhel 5.7 cluster get's into "LEAVE_START_WAIT" on on a given iscsi volume, the following occurs: 
> 	• I can't r/w io to the volume.
> 	• Can't unmount it, from any node.
> 	• In flight/pending IO's are impossible to determine or kill since lsof on the mount fails. Basically all IO operations stall/fail.
> So my questions are:
> 
> 	• What does the output from group_tool -v really indicate, "00030005 LEAVE_START_WAIT 12 c000b0002 1" ? Man on group_tool doesn't list these fields.
> 	• Does anyone have a list of what these fields represent ?
> 	• Corrective actions. How do i get out of this state without rebooting the entire cluster ?
> 	• Is it possible to determine the offending node ?
> thanks,
> -Cedric
> 
> 
> //misc output
> 
> root bl13-node13:~# group_tool -v
> type             level name            id       state node id local_done
> fence            0     default         0001000d none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     clvmd           0001000c none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk1  00020005 none        
> [4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk2  00040005 none        
> [4 5 6 7 8 9 10 11 13 14 15]
> dlm              1     cluster3_disk7  00060005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk8  00080005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk9  000a0005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     disk10          000c0005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     rgmanager       0001000a none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk3  00020001 none        
> [1 5 6 7 8 9 10 11 12 13]
> dlm              1     cluster3_disk6  00020008 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk1  00010005 none        
> [4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk2  00030005 LEAVE_START_WAIT 12 c000b0002 1
> [4 5 6 7 8 9 10 11 13 14 15]
> gfs              2     cluster3_disk7  00050005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk8  00070005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk9  00090005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     disk10          000b0005 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk3  00010001 none        
> [1 5 6 7 8 9 10 11 12 13]
> gfs              2     cluster3_disk6  00010008 none        
> [1 4 5 6 7 8 9 10 11 12 13 14 15]



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]