[Linux-cluster] qdiskd: Updated votes configuration not used even after restart

Chrissie Caulfield ccaulfie at redhat.com
Wed May 20 13:59:59 UTC 2009


Lon Hohberger wrote:
> On Wed, 2009-05-20 at 08:08 +0100, Chrissie Caulfield wrote:
>>>  - if a quorum device exists and it is being reregistered with the same
>>>    name, just change the votes and recalculate quorum
>> cman doesn't allow the votes to be changed without deregistering and
>> reregistering the quorum device.
>>
>> I have checked the code and I can't see any reason why doing it this way
>> would fail, if register succeeds then it allocates a new node structure
>> for the qdisk and populates it from the parameters given.
>>
>> Is it possible that qdisk might not unregister the qdisk  when it is
>> stopped under some circumstances ?
> 
> It's possible, but unlikely -- it only ever doesn't unregister if it:
> 
> (a) hits I/O errors
> (b) is killed with -SIGKILL
> (c) cman went away (in which case it doesn't matter :) )
> 
> I suspect:
> 
>         if (quorum_device->state == NODESTATE_MEMBER)
>                 return -EBUSY;


Yes, that sounds very likely

> ... is causing the unregister operation to fail.  Maybe I need to call
> cman_poll_quorum_device(xxx, 0).  It seems a bit odd.
> 
> Basically, the use case is online upgrade of # of nodes in the cluster.
> 
> 3 nodes + 2-vote quorum device ==> 4 nodes + 3-vote quorum device
> 
> In my mind, it'd work like:
> 
> * Ensure all current members are up and healthy
>   * each old member sees: votes = 3 + 2
> * Update cluster.conf w/ new member.
> * Copy cluster.conf to new member
>   * each old member sees: votes = 4 + 2
> * Have new member start cluster stack 
>   * each old member sees: votes = 4 + 2
>   * the new member sees: votes = 4 + 3
> * Stop qdiskd on the old nodes
>   * each old member sees: votes = 4
>   * the new member sees: votes = 4 + 3
> * Restart qdiskd on the old nodes 
>   * everyone is consistent w/ 4 + 3
> 
> I don't think calling poll(0) will make a difference in the above case,
> but I had gotten used to the fact that if you kill qdiskd you had a few
> seconds to restart it before CMAN noticed... 
> 
> So, I can fix it I think with poll(0), but if an admin kills qdiskd with
> SIGKILL (or any other fatal signal), restarting qdiskd will prevent
> correct vote registration (though as I have found out, polling still
> works great).

When qdiskd restarts, if you get EBUSY from _register then you could
deregister and reregister with the new information.

There's an argument here for a cman API call to change the number of
votes associated with the quorum disk though ... what do you think ?

Chrissie




More information about the Linux-cluster mailing list