[Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue

Jiaju Zhang jjzhang.linux at gmail.com
Fri May 14 11:25:03 UTC 2010


On Fri, May 14, 2010 at 5:52 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
> On Thu, May 13, 2010 at 10:36 PM, Lars Marowsky-Bree <lmb at novell.com> wrote:
>> On 2010-05-13T22:19:58, Andrew Beekhof <andrew at beekhof.net> wrote:
>>
>>> > ais_dispatch() was moved to update_cluster(). That's safe, it seems.
>>> But pointless and unrelated to the problem, so why include it?
>>
>> I took it that JJ thought this was part of the solution.
>
> ci == ais_async_fd,  if there was something to read we should have
> already had a callback.
>
>>
>>> Does the behavior still occur with pacemaker 1.1.2?
>>
>> Yes. Which fix do you think would change this?
>
> These two that I mentioned last week:
>  http://hg.clusterlabs.org/pacemaker/1.1/rev/34ae9f7b4675
>  http://hg.clusterlabs.org/pacemaker/1.1/rev/19fdc0a3885a

I've confirmed that my pacemaker has the above patch and it doesn't work.

>
> I really doubt the patch is the right fix, conceptually, the node
> can't be a member until the crmd process is running and done a whole
> heap of processing.
> So CRM_NODE_MEMBER without crm_proc_crmd makes no sense.
>
> More likely, as-in the patches above, the process list wasn't being
> updated correctly and thats what needs to get fixed.
> It just never mattered before.

This is the where the race condition happens. CRM_NODE_MEMBER was set
from the local node, but crm_proc_crmd was updated only when the peer
node sending the local node this info. When CRM_NODE_MEMBER was
updated but crm_proc_crmd was not updated yet, dlm_controld went to
read the crm_peer_id_cache, this bug happened.

Yes, Updating CRM_NODE_MEMBER and updating crm_proc_crmd should be an
_atomic_ operation, before that has finished, dlm_controld shoud not
read the membership info. So in fact,  I have been thinking about
three solutions to this problem:

1) After updating (CRM_NODE_MEMBER && crm_proc_crmd) finished, notify
dlm_controld. But I'm not sure if we can achieve this goal by doing
some work around ais_async_fd. Maybe need to dig into the code
further.
2) maybe add another flag to do some syncing between Pacemaker and
dlm_controld, it might like what dlm_controld and fs_controld has
done, but it may need much more work and a little complicated.
3) As the patch has done, only see CRM_NODE_MEMBER but not see
crm_proc_crmd. It seems have risk?(open disscussion). But after some
consideration, I think this should be OK. Because we eventually get
the membership info from Corosync, even we haven't recevied the
crm_proc_crmd info this moment, we can receive that info at once. If
some bad thing happened and the peer node was unfortunately down, we
can still received that info from Corosync. Also, the time we set the
membership info to DLM is just the node membership info, we don't need
to care if crmd is really started at this moment. The membership info
also is udpated at many other places when we do some work about
lockspace. So it should be safe as my think.

Thanks,
Jiaju




More information about the Cluster-devel mailing list