[Linux-cluster] Rhel 5.7 Cluster - gfs2 volume in "LEAVE_START_WAIT" status

Sun Jun 3 17:17:16 UTC 2012

Hello Cedric

Are you using gfs or gfs2? if you are using gfs  i recommend to use gfs2

2012/6/3 Cedric Kimaru <rhel_cluster at ckimaru.com>

> Fellow Cluster Compatriots,
> I'm looking for some guidance here. Whenever my rhel 5.7 cluster get's
> into "*LEAVE_START_WAIT*" on on a given iscsi volume, the following
> occurs:
>
>    1. I can't r/w io to the volume.
>    2. Can't unmount it, from any node.
>    3. In flight/pending IO's are impossible to determine or kill since
>    lsof on the mount fails. Basically all IO operations stall/fail.
>
> So my questions are:
>
>    1. What does the output from group_tool -v really indicate, *"00030005
>    LEAVE_START_WAIT 12 c000b0002 1" *? Man on group_tool doesn't list
>    these fields.
>    2. Does anyone have a list of what these fields represent ?
>    3. Corrective actions. How do i get out of this state without
>    rebooting the entire cluster ?
>    4. Is it possible to determine the offending node ?
>
> thanks,
> -Cedric
>
>
> //misc output
>
> root at bl13-node13:~# clustat
> Cluster Status for cluster3 @ Sat Jun  2 20:47:08 2012
> Member Status: Quorate
>
>  Member Name                                                     ID
> Status
>  ------ ----                                                     ----
> ------
> bl01-node01                                      1 Online, rgmanager
>  bl04-node04                                      4 Online, rgmanager
>  bl05-node05                                      5 Online, rgmanager
>  bl06-node06                                      6 Online, rgmanager
>  bl07-node07                                      7 Online, rgmanager
>  bl08-node08                                      8 Online, rgmanager
>  bl09-node09                                      9 Online, rgmanager
>  bl10-node10                                     10 Online, rgmanager
>  bl11-node11                                     11 Online, rgmanager
>  bl12-node12                                     12 Online, rgmanager
>  bl13-node13                                     13 Online, Local,
> rgmanager
>  bl14-node14                                     14 Online, rgmanager
>  bl15-node15                                     15 Online, rgmanager
>
>
>  Service Name                                                 Owner
> (Last)                                                 State
>  ------- ----                                                 -----
> ------                                                 -----
>  service:httpd
> bl05-node05                               started
>  service:nfs_disk2
> bl08-node08                               started
>
>
> root at bl13-node13:~# group_tool -v
> type             level name            id       state node id local_done
> fence            0     default         0001000d none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     clvmd           0001000c none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk1  00020005 none
> [4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk2  00040005 none
> [4 5 6 7 8 9 10 11 13 14 15]
> dlm              1     cluster3_disk7  00060005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk8  00080005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk9  000a0005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     disk10          000c0005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     rgmanager       0001000a none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> dlm              1     cluster3_disk3  00020001 none
> [1 5 6 7 8 9 10 11 12 13]
> dlm              1     cluster3_disk6  00020008 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk1  00010005 none
> [4 5 6 7 8 9 10 11 12 13 14 15]
> *gfs              2     cluster3_disk2  00030005 LEAVE_START_WAIT 12
> c000b0002 1
> [4 5 6 7 8 9 10 11 13 14 15]*
> gfs              2     cluster3_disk7  00050005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk8  00070005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk9  00090005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     disk10          000b0005 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
> gfs              2     cluster3_disk3  00010001 none
> [1 5 6 7 8 9 10 11 12 13]
> gfs              2     cluster3_disk6  00010008 none
> [1 4 5 6 7 8 9 10 11 12 13 14 15]
>
> root at bl13-node13:~# gfs2_tool list
> 253:15 cluster3:cluster3_disk6
> 253:16 cluster3:cluster3_disk3
> 253:18 cluster3:disk10
> 253:17 cluster3:cluster3_disk9
> 253:19 cluster3:cluster3_disk8
> 253:21 cluster3:cluster3_disk7
> 253:22 cluster3:cluster3_disk2
> 253:23 cluster3:cluster3_disk1
>
> root at bl13-node13:~# lvs
>     Logging initialised at Sat Jun  2 20:50:03 2012
>     Set umask from 0022 to 0077
>     Finding all logical volumes
>   LV                            VG                            Attr
> LSize   Origin Snap%  Move Log Copy%  Convert
>   lv_cluster3_Disk7             vg_Cluster3_Disk7             -wi-ao
> 3.00T
>   lv_cluster3_Disk9             vg_Cluster3_Disk9             -wi-ao
> 200.01G
>   lv_Cluster3_libvert           vg_Cluster3_libvert           -wi-a-
> 100.00G
>   lv_cluster3_disk1             vg_cluster3_disk1             -wi-ao
> 100.00G
>   lv_cluster3_disk10            vg_cluster3_disk10            -wi-ao
> 15.00T
>   lv_cluster3_disk2             vg_cluster3_disk2             -wi-ao
> 220.00G
>   lv_cluster3_disk3             vg_cluster3_disk3             -wi-ao
> 330.00G
>   lv_cluster3_disk4_1T-kvm-thin vg_cluster3_disk4_1T-kvm-thin -wi-a-
> 1.00T
>   lv_cluster3_disk5             vg_cluster3_disk5             -wi-a-
> 555.00G
>   lv_cluster3_disk6             vg_cluster3_disk6             -wi-ao
> 2.00T
>   lv_cluster3_disk8             vg_cluster3_disk8             -wi-ao
> 2.00T
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120603/359bdacd/attachment.htm>