[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] qdiskd: Updated votes configuration not used even after restart



On Wed, 2009-05-20 at 08:08 +0100, Chrissie Caulfield wrote:
> >  - if a quorum device exists and it is being reregistered with the same
> >    name, just change the votes and recalculate quorum
> 
> cman doesn't allow the votes to be changed without deregistering and
> reregistering the quorum device.
> 
> I have checked the code and I can't see any reason why doing it this way
> would fail, if register succeeds then it allocates a new node structure
> for the qdisk and populates it from the parameters given.
> 
> Is it possible that qdisk might not unregister the qdisk  when it is
> stopped under some circumstances ?

It's possible, but unlikely -- it only ever doesn't unregister if it:

(a) hits I/O errors
(b) is killed with -SIGKILL
(c) cman went away (in which case it doesn't matter :) )

I suspect:

        if (quorum_device->state == NODESTATE_MEMBER)
                return -EBUSY;

... is causing the unregister operation to fail.  Maybe I need to call
cman_poll_quorum_device(xxx, 0).  It seems a bit odd.

Basically, the use case is online upgrade of # of nodes in the cluster.

3 nodes + 2-vote quorum device ==> 4 nodes + 3-vote quorum device

In my mind, it'd work like:

* Ensure all current members are up and healthy
  * each old member sees: votes = 3 + 2
* Update cluster.conf w/ new member.
* Copy cluster.conf to new member
  * each old member sees: votes = 4 + 2
* Have new member start cluster stack 
  * each old member sees: votes = 4 + 2
  * the new member sees: votes = 4 + 3
* Stop qdiskd on the old nodes
  * each old member sees: votes = 4
  * the new member sees: votes = 4 + 3
* Restart qdiskd on the old nodes 
  * everyone is consistent w/ 4 + 3

I don't think calling poll(0) will make a difference in the above case,
but I had gotten used to the fact that if you kill qdiskd you had a few
seconds to restart it before CMAN noticed... 

So, I can fix it I think with poll(0), but if an admin kills qdiskd with
SIGKILL (or any other fatal signal), restarting qdiskd will prevent
correct vote registration (though as I have found out, polling still
works great).

-- Lon


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]