On Friday 14 November 2008 16:26:49 David Teigland wrote:
> On Fri, Nov 14, 2008 at 10:00:13AM +0000, Nuno Fernandes wrote:
> > 22236 [dlm_recoverd] dlm_wait_function
> > 25097 [dlm_recoverd] dlm_wait_function
> dlm recovery appears to be stuck; this is usually due to a problem at the
> network level. The recovery seems to be caused by a node starting clvmd.
I don't know if it helps, but groupd is using all available CPU, but only in 2 of the nodes.
I don't know if it's required to be up.. but we've disabled IPV6..
snip of modprobe.conf:
alias net-pf-10 off
> sysrq-t backtraces from all the nodes could confirm some of this, and
> adding <dlm log_debug="1"/> to cluster.conf would give us more information
> the next time it happens.