[Linux-cluster] Problem in clvmd/dlm_recoverd

Christine Caulfield ccaulfie at redhat.com
Fri Nov 14 10:29:50 UTC 2008


Nuno Fernandes wrote:
> Hi,
> 
> we have an cluster with 7 machines with a SAN. We are using them to
> provide virtual machines, so we are using clvmd.
> 
> At some point we are unable to access any of the pv/lv/vg tools. They
> are all stuck. From stracing them i've come to the conclusion that they
> are waiting for clvmd.
> 

They could be waiting for fencing to complete.

Have a look at the output from group_tool, that will tell you which
services have recovered after a node has joined or left the cluster

Chrissie


> Nuno Fernandes
> 
> in host xen1:
> 
> Linux blade01.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4
> 14:13:09 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
> 
> lvm2-cluster-2.02.32-4.el5
> 
> cman-2.0.84-2.el5_2.1
> 
> PID TTY STAT TIME COMMAND
> 
> 20874 ? D< 0:00 \_ [dlm_recoverd]
> 
> 20854 pts/1 S+ 0:00 \_ /bin/sh /sbin/service clvmd start
> 
> 20861 pts/1 S+ 0:00 \_ /bin/bash /etc/init.d/clvmd start
> 
> 20931 pts/1 S+ 0:00 \_ /usr/sbin/vgscan -d
> 
> 20869 ? Ssl 0:00 clvmd -T40
> 
> ps ax -o pid,cmd,wchan
> 
> 20874 [dlm_recoverd] -
> 
> ------------------------------
> 
> Connection to xen1 closed.
> 
> in host xen2:
> 
> Linux blade02.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56
> WEST 2007 x86_64 x86_64 x86_64 GNU/Linux
> 
> lvm2-cluster-2.02.16-3.el5
> 
> cman-2.0.64-1.0.1.el5
> 
> PID TTY STAT TIME COMMAND
> 
> 22662 ? D< 0:00 \_ [dlm_recoverd]
> 
> 22613 ? Ssl 0:02 clvmd -T40
> 
> ps ax -o pid,cmd,wchan
> 
> 22662 [dlm_recoverd] -
> 
> ------------------------------
> 
> Connection to xen2 closed.
> 
> in host xen3:
> 
> Linux blade03.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56
> WEST 2007 x86_64 x86_64 x86_64 GNU/Linux
> 
> lvm2-cluster-2.02.16-3.el5
> 
> cman-2.0.64-1.0.1.el5
> 
> PID TTY STAT TIME COMMAND
> 
> 22236 ? D< 0:00 \_ [dlm_recoverd]
> 
> 22231 ? Ssl 0:02 clvmd -T40
> 
> ps ax -o pid,cmd,wchan
> 
> Connection to xen3 closed.
> 
> 22236 [dlm_recoverd] dlm_wait_function
> 
> ------------------------------
> 
> in host xen4:
> 
> Linux blade04.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56
> WEST 2007 x86_64 x86_64 x86_64 GNU/Linux
> 
> lvm2-cluster-2.02.16-3.el5
> 
> cman-2.0.64-1.0.1.el5
> 
> PID TTY STAT TIME COMMAND
> 
> 25097 ? D< 0:00 \_ [dlm_recoverd]
> 
> 25092 ? Ssl 0:02 clvmd -T40
> 
> ps ax -o pid,cmd,wchan
> 
> 25097 [dlm_recoverd] dlm_wait_function
> 
> ------------------------------
> 
> Connection to xen4 closed.
> 
> in host xen5:
> 
> Linux blade05.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4
> 14:13:09 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
> 
> lvm2-cluster-2.02.32-4.el5
> 
> cman-2.0.84-2.el5_2.1
> 
> PID TTY STAT TIME COMMAND
> 
> 22333 ? D< 0:00 \_ [dlm_recoverd]
> 
> 22328 ? Ssl 0:02 clvmd -T40
> 
> ps ax -o pid,cmd,wchan
> 
> 22333 [dlm_recoverd] -
> 
> ------------------------------
> 
> Connection to xen5 closed.
> 
> in host xen6:
> 
> Linux blade06.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4
> 14:13:09 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
> 
> lvm2-cluster-2.02.32-4.el5
> 
> cman-2.0.84-2.el5_2.1
> 
> PID TTY STAT TIME COMMAND
> 
> ps ax -o pid,cmd,wchan
> 
> ------------------------------
> 
> Connection to xen6 closed.
> 
> in host xen7:
> 
> Linux blade07.dc.xpto.com 2.6.18-92.1.13.el5xen #1 SMP Wed Sep 24
> 20:01:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
> 
> lvm2-cluster-2.02.32-4.el5
> 
> cman-2.0.84-2.el5
> 
> cman-2.0.84-2.el5_2.1
> 
> PID TTY STAT TIME COMMAND
> 
> 19793 ? D< 0:00 \_ [dlm_recoverd]
> 
> 19788 ? Ssl 0:01 clvmd -T40
> 
> ps ax -o pid,cmd,wchan
> 
> 19793 [dlm_recoverd] -
> 
> ------------------------------
> 
> Connection to xen7 closed.
>




More information about the Linux-cluster mailing list