[Linux-cluster] qdiskd + cman: trying to fix the use of quorumdev_poll.
Patrick Caulfield
pcaulfie at redhat.com
Tue Jan 9 13:34:47 UTC 2007
Simone Gotti wrote:
> Hi all,
>
> I'm using the openais based cman-2.0.35.el5 and I'm trying to understand
> how the quorum disk concept is implemented in rhcs, after various
> experiments I think that I found at least 2 problems:
>
> Problem 1)
>
> Little bug in the quorum disk polling mechanism:
>
> looking at the code in cman/daemon/commands.c the variable
> quorumdev_poll = 10000 is expressed in milliseconds and used to call
> "quorum_device_timer_fn" every quorumdev_poll interval to check if
> qdiskd is informing cman that the node can use the quorum votes.
>
> The same variable is then used in quorum_device_timer_fn, but here it's
> used as seconds:
>
> if (quorum_device->last_hello.tv_sec + quorumdev_poll < now.tv_sec) {
>
> so, when the qdisks dies, or the access to the quorum disk is lost it
> will take more than 2 hours to notify this and recalculate the quorum.
>
> After changing the line:
> ========================================================================
> --- cman-2.0.35.orig/cman/daemon/commands.c 2007-01-07
> 21:01:30.000000000 +0100
> +++ cman-2.0.35.patched/cman/daemon/commands.c 2007-01-05
> 18:12:33.000000000 +0100
> @@ -1038,15 +1037,12 @@ static void ccsd_timer_fn(void *arg)
>
> static void quorum_device_timer_fn(void *arg)
> {
> struct timeval now;
> if (!quorum_device || quorum_device->state == NODESTATE_DEAD)
> return;
>
> gettimeofday(&now, NULL);
> - if (quorum_device->last_hello.tv_sec + quorumdev_poll <
> now.tv_sec) {
> + if (quorum_device->last_hello.tv_sec + quorumdev_poll/1000 <
> now.tv_sec) {
> quorum_device->state = NODESTATE_DEAD;
> log_msg(LOG_INFO, "lost contact with quorum device\n");
> recalculate_quorum(0);
> ========================================================================
>
Thanks. I've committed that version for now.
> it worked. A more precise fix should be the use if tv_usec/1000 instead
> of tv_sec.
True, it needs to take both into account. For the sake of time I've left the
granularity at seconds.
--
patrick
More information about the Linux-cluster
mailing list