[Linux-cluster] service stuck in "recovering", no attempt to restart

Ofer Inbar cos at aaaaa.org
Wed Oct 5 02:43:57 UTC 2011


After collecting all of the information in my previous mailing,
I then tried restarting the service using clusvcadm -R, to no avail:

| $ sudo clusvcadm -R dn
| Local machine trying to restart service:dn...

And so it stood for over a minute, with no evidence that it was
actually trying to start anything, so I hit ^C.

Next, I restarted rgmanager on all three nodes simultaneously,
using "sudo service rgmanager restart".  When rgmanager came
back up, the service was in status "recoverable" and then soon
after, it got started successully on node2.

So now the service is running, but it's still a complete mystery
to me why it never got restarted before, and why I had to restart
rgmanager to get it to bring the service up.  I also don't know
what, if anything, I need to do to prevent this from happening again.

[I did try killing processes a few times and observed successful
relocations and restarts, so the cluster seems to be in a good
state for now...]
  -- Cos




More information about the Linux-cluster mailing list