[Linux-cluster] service state unchanged when host crashes
Martin Waite
Martin.Waite at datacash.com
Tue Oct 27 09:57:50 UTC 2009
Hi Jakov
I am running Debian Lenny 64-bit. Is that going to be a problem for me
?
I think you have given me enough of a pointer - ie. I haven't configured
fencing properly - to get me going again. Thanks.
regards,
Martin
====
Just out of interest, here are the logs:
Here is the syslog from clusternode28 when I suspended clusternode30:
Oct 26 18:29:51 clusternode28 clurgmgrd[3980]: <debug> Membership Change
Event
Oct 26 18:29:51 clusternode28 clurgmgrd[3980]: <info> State change:
clusternode30 DOWN
Oct 26 18:29:51 clusternode28 clurgmgrd[3980]: <debug> Membership Change
Event
Oct 26 18:29:51 clusternode28 clurgmgrd[3980]: <debug> Membership Change
Event
Oct 26 18:29:51 clusternode28 fenced[16118]: fencing deferred to
clusternode27
Then, on clusternode27:
Oct 26 18:29:52 clusternode27 kernel: [438082.708458] dlm: closing
connection to node 30
Oct 26 18:29:52 clusternode27 clurgmgrd[20955]: <debug> Membership
Change Event
Oct 26 18:29:52 clusternode27 clurgmgrd[20955]: <info> State change:
clusternode30 DOWN
Oct 26 18:29:52 clusternode27 clurgmgrd[20955]: <debug> Membership
Change Event
Oct 26 18:29:52 clusternode27 clurgmgrd[20955]: <debug> Membership
Change Event
Oct 26 18:29:52 clusternode27 fenced[12749]: clusternode30 not a cluster
member after 0 sec post_fail_delay
Oct 26 18:29:52 clusternode27 fenced[12749]: fencing node
"clusternode30"
Oct 26 18:29:52 clusternode27 fenced[12749]: fence "clusternode30"
failed
Oct 26 18:29:57 clusternode27 fenced[12749]: fencing node
"clusternode30"
Oct 26 18:29:57 clusternode27 fenced[12749]: fence "clusternode30"
failed
Oct 26 18:30:02 clusternode27 fenced[12749]: fencing node
"clusternode30"
Oct 26 18:30:02 clusternode27 fenced[12749]: fence "clusternode30"
failed
... and so on ...
I haven't configured fencing properly, have I ?
<clusternode name="clusternode30" nodeid="30">
<multicast addr="224.0.0.1" interface="eth0:1"/>
<fence>
<!-- Handle fencing manually -->
<method name="human">
<device name="human" nodename="hostname1"/>
</method>
</fence>
</clusternode>
When I un-suspended clusternode30 (15 hours later), cman on
clusternode27 throws an error and quits:
Oct 27 10:50:01 clusternode27 fenced[12749]: fencing node
"clusternode30"
Oct 27 10:50:01 clusternode27 fenced[12749]: fence "clusternode30"
failed
Oct 27 10:50:05 clusternode27 clurgmgrd[20955]: <debug> Membership
Change Event
Oct 27 10:50:05 clusternode27 clurgmgrd[20955]: <debug> Membership
Change Event
Oct 27 10:50:06 clusternode27 fenced[12749]: fencing node
"clusternode30"
Oct 27 10:50:06 clusternode27 fenced[12749]: fence "clusternode30"
failed
Oct 27 10:50:11 clusternode27 fenced[12749]: fencing node
"clusternode30"
Oct 27 10:50:11 clusternode27 fenced[12749]: fence "clusternode30"
failed
Oct 27 10:50:16 clusternode27 fenced[12749]: fencing node
"clusternode30"
Oct 27 10:50:16 clusternode27 fenced[12749]: fence "clusternode30"
failed
Oct 27 10:50:20 clusternode27 openais[12741]: CMAN: Joined a cluster
with disallowed nodes. must die
Oct 27 10:50:20 clusternode27 kernel: [496910.220602] dlm: closing
connection to node 28
Oct 27 10:50:20 clusternode27 kernel: [496910.220710] dlm: closing
connection to node 27
Oct 27 10:50:20 clusternode27 dlm_controld[12751]: cluster is down,
exiting
Oct 27 10:50:20 clusternode27 gfs_controld[12753]: groupd_dispatch error
-1 errno 11
Oct 27 10:50:20 clusternode27 gfs_controld[12753]: groupd connection
died
Oct 27 10:50:20 clusternode27 gfs_controld[12753]: cluster is down,
exiting
Oct 27 10:50:47 clusternode27 ccsd[12736]: Unable to connect to cluster
infrastructure after 30 seconds.
-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jakov Sosic
Sent: 27 October 2009 09:38
To: linux-cluster at redhat.com
Subject: Re: [Linux-cluster] service state unchanged when host crashes
On Mon, 26 Oct 2009 17:40:24 -0000
"Martin Waite" <Martin.Waite at datacash.com> wrote:
> Hi,
>
> I have 3 VMs running in a cluster. 4 services are defined, one of
> which ("SENTINEL") is running on clusternode30.
>
> I then suspended clusternode30 in the VM console. Cman notices the
> disappearance within a few seconds. However, the SENTINEL service
> that was running is still flagged as "started".
Could you please post your /var/log/messages when one node is fenced?
Also, are you using Debian/Ubuntu by any chance?
--
| Jakov Sosic | ICQ: 28410271 | PGP: 0x965CAE2D |
=================================================================
| start fighting cancer -> http://www.worldcommunitygrid.org/ |
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list