[Cluster-devel] unfencing (cman startup)

Mon Mar 2 07:59:58 UTC 2009

Fabio M. Di Nitto wrote:
> On Fri, 2009-02-27 at 09:52 -0600, David Teigland wrote:
>> On Fri, Feb 27, 2009 at 12:54:20PM +0000, Chrissie Caulfield wrote:
>>>>>> Given the time at which fence_node -U will fire, you probably want to
>>>>>> add a cman_init + cman_is_active + cman_finish loop in fence_node to
>>>>>> make sure cman is ready to reply to our ccs queries, otherwise we might
>>>>>> have a race condition at boot time (it might be already there.. didn't
>>>>>> really check the code). All our daemons do that to give cman time to
>>>>>> bootstrap.
>>>>> Yes, good point.  I wonder if we'd be better off having cman_tool join
>>>>> effectively do an is_active wait before exiting?  Then we could probably
>>>>> avoid doing it many other places.  (It's also annoying when corosync crashes
>>>>> after is_active completes, but before I've read what I need from cman/ccs.)
>>> Err, cman_tool already does this with the -w switch, and the init script
>>> uses it.
>> Great, so the constant flogging to add cman_is_active checks everywhere will
>> end!?  Can I remove all my cman_is_active loops?
> 
> This works fine via init script. We could theoretically kill all those
> loops but at least for us developers, that start stuff by hand, they
> could still be useful.. and maybe a good failsafe if we ask users to run
> something manually for debugging.. dunno.. just a thought. I don't have
> a strong opinion on this matter.
> 

You might as well take them out to be honest. Those loops are mostly
overspill from the RHEL4 cman where cman started up but could take 20-30
seconds to start or join a cluster. With openais/corosync once the
daemon is up then you can talk to it.

It might not be quorate ... but that IS your problem :-)

Chrissie