[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] "->ls_in_recovery" not released



David Teigland wrote:
On Mon, Nov 22, 2010 at 05:31:25PM +0100, Menyhart Zoltan wrote:
We have got a two-node OCFS2 file system controlled by the pacemaker.

Are you using dlm_controld.pcmk?

Yes.

If so, please try the latest versions of
pacemaker that use the standard dlm_controld.

Actually we have dlm-pcmk-3.0.12-23.el6.x86_64.

I downloaded git://git.fedorahosted.org/dlm.git
We shall try it soon.

"ls_recover()" includes several other cases when it simply goes
to the "fail:" branch without setting free "->ls_in_recovery" and
without cleaning up the inconsistent data left behind.

I think some error handling code is missing in "ls_recover()".
Have you modified the DLM since the RHEL 6.0?

No, in_recovery is supposed to remain locked until recovery completes.
Any number of ls_recover() calls can fail due to more member changes
during recovery, but one of them should eventually succeed (complete
recovery), once the membership stops changing.  Then in_recovery will be
unlocked.

Look at the specific errors causing ls_recover() to fail, and check if
it's a confchg-related failure (like above), or another kind of error.

Assume the "other" node is lost, possibly forever.
"dlm_wait_function()" can return only if "dlm_ls_stop()" gets called
in the mean time.

I suppose the user-land can do something like this:

echo 0 > /sys/kernel/dlm/14E8093BB71D447EBEE691622CF86B9C/control

Actually I tried it by hand: it did not unblock the situation.
I gues at the next time, it was "ping_members()" that returned
with error==1.  The dead"other" node was still on the list.
Again, "ls_recover()" returned without setting free "->ls_in_recovery".

How can be "ls_recover()" reentered to be able to carry out the
recovery and to set "->ls_in_recovery" free?
(Assuming the "other" node is lost, possibly forever.)

Thanks for your response.

Zoltan Menyhart


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]