[Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue
Andrew Beekhof
andrew at beekhof.net
Fri May 14 10:15:12 UTC 2010
On Fri, May 14, 2010 at 5:04 AM, Tim Serong <tserong at novell.com> wrote:
> On 5/14/2010 at 06:19 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>
>> Does the behavior still occur with pacemaker 1.1.2?
>>
>
> Yes.
>
> For the record, the most minimal testcase I've managed for this
> so far is as follows (substitute "/etc/init.d/corosync start" or
> whatever for "rcopenais start" if you're not on something SUSE-based):
>
> 1) Configure corosync/openais on two nodes.
> Do not start the cluster yet.
>
> 2) On one node:
>
> # rm /var/lib/heartbeat/crm/*
> # rcopenais start
> # while ! crm_mon -1 | grep -qi online; do \
> echo -n "." ; sleep 5 ; done
>
> 3) Now we have one node online, configure Pacemaker:
>
> # cat <<CONF | crm configure
> primitive dlm ocf:pacemaker:controld
> primitive clvm ocf:lvm2:clvmd
> group g dlm clvm
> clone c g meta interleave="true"
> property stonith-enabled="false"
> property no-quorum-policy="ignore"
> commit
> CONF
>
> Watch "crm_mon -r" until that clone comes online.
> Should only take a few seconds.
>
> 4) On the other node:
>
> # rm /var/lib/heartbeat/crm/*
> # rcopenais start
>
> The first node will now either wedge up spectacularly, and/or
> dlm_recoverd and clvmd will be stuck in D state on both nodes.
Presumably each thinks the other node isn't a member?
Perhaps something like this will help:
diff -r b59c27dc114a lib/ais/plugin.c
--- a/lib/ais/plugin.c Wed May 12 10:51:56 2010 +0200
+++ b/lib/ais/plugin.c Fri May 14 12:12:33 2010 +0200
@@ -498,9 +498,8 @@ static void *pcmk_wait_dispatch (void *a
ais_notice("Respawning failed child process: %s",
pcmk_children[lpc].name);
spawn_child(&(pcmk_children[lpc]));
- } else {
- send_cluster_id();
}
+ send_cluster_id();
}
}
sched_yield ();
@@ -661,6 +660,7 @@ int pcmk_startup(struct corosync_api_v1
}
}
}
+ send_cluster_id();
return 0;
}
More information about the Cluster-devel
mailing list