[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue



On Fri, May 14, 2010 at 5:04 AM, Tim Serong <tserong novell com> wrote:
> On 5/14/2010 at 06:19 AM, Andrew Beekhof <andrew beekhof net> wrote:
>>
>> Does the behavior still occur with pacemaker 1.1.2?
>>
>
> Yes.
>
> For the record, the most minimal testcase I've managed for this
> so far is as follows (substitute "/etc/init.d/corosync start" or
> whatever for "rcopenais start" if you're not on something SUSE-based):
>
> 1) Configure corosync/openais on two nodes.
>   Do not start the cluster yet.
>
> 2) On one node:
>
>     # rm /var/lib/heartbeat/crm/*
>     # rcopenais start
>     # while ! crm_mon -1 | grep -qi online; do \
>         echo -n "." ; sleep 5 ; done
>
> 3) Now we have one node online, configure Pacemaker:
>
>     # cat <<CONF | crm configure
>     primitive dlm ocf:pacemaker:controld
>     primitive clvm ocf:lvm2:clvmd
>     group g dlm clvm
>     clone c g meta interleave="true"
>     property stonith-enabled="false"
>     property no-quorum-policy="ignore"
>     commit
>     CONF
>
>   Watch "crm_mon -r" until that clone comes online.
>   Should only take a few seconds.
>
> 4) On the other node:
>
>     # rm /var/lib/heartbeat/crm/*
>     # rcopenais start
>
> The first node will now either wedge up spectacularly, and/or
> dlm_recoverd and clvmd will be stuck in D state on both nodes.

Presumably each thinks the other node isn't a member?
Perhaps something like this will help:

diff -r b59c27dc114a lib/ais/plugin.c
--- a/lib/ais/plugin.c	Wed May 12 10:51:56 2010 +0200
+++ b/lib/ais/plugin.c	Fri May 14 12:12:33 2010 +0200
@@ -498,9 +498,8 @@ static void *pcmk_wait_dispatch (void *a
 		    ais_notice("Respawning failed child process: %s",
 			       pcmk_children[lpc].name);
 		    spawn_child(&(pcmk_children[lpc]));
-		} else {
-		    send_cluster_id();
 		}
+		send_cluster_id();
 	    }
 	}
 	sched_yield ();
@@ -661,6 +660,7 @@ int pcmk_startup(struct corosync_api_v1
 	    }
 	}
     }
+    send_cluster_id();

     return 0;
 }


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]