[Linux-cluster] What does FAIL_STOP_WAIT state mean for clvmd and rgmanager

Mon Sep 20 06:21:29 UTC 2010

I'm not sure possibly it was from doing a "service cman restart"

I understand its always preferrable to reboot with cluster suite but some of
our physical hosts can take 20 minutes to do a full reboot, so I'm always
look for some way to fix them online.

Joel

On Fri, Sep 10, 2010 at 4:03 AM, Lon Hohberger <lhh at redhat.com> wrote:

> On Mon, 2010-08-23 at 17:58 +1000, Joel Heenan wrote:
> > Can someone please explain what this means and what you can do to get
> > out of it:
> >
> > [root at cluster-host ~]# group_tool -v
> > type             level name       id       state node id local_done
> > fence            0     default    00010003 JOIN_STOP_WAIT 1 100050001
> > 1
> > [1 1 2 3 4]
> > dlm              1     clvmd      00020003 FAIL_STOP_WAIT 2 200030003
> > 1
> > [1 2 3 4]
> > dlm              1     rgmanager  00030003 FAIL_STOP_WAIT 2 200030003
> > 1
> > [1 2 3 4]
>
> It looks like fencing has not completed.  How do you have 2 node 1's in
> the fencing group?
>
> -- Lon
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100920/5abb94e4/attachment.htm>