[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Cluster node won't rejoin cluster after fencing, stops at cman



Hi,
I think this is common behavior in two node cluster setup, for some reason
fence_domain is disorganized.
Try following.

Verify that node01 is up and running correctly, node02 has same version of
cluster.conf that node01 has.
reboot node01, after you have pressed enter after reboot command reboot node02
immediately after node01 so that nodes are comming up with latency of
few seconds, this
should fix up fence_domain so that the rest of service's cman ccsd
etc.. are able to start.
There is now way to do this manually in my experience. ( startting
service's manually)

Verify also that your fence device's are working properly!

/jari




On 14/09/06, Bosse Klykken <bosse klykken com> wrote:
Hi.

I'm having some issues with a two-node failover cluster on RHEL4/U3 with
kernel 2.6.9-34.0.1.ELsmp, ccs-1.0.3-0, cman-1.0.4-0, fence-1.32.18-0
and rgmanager-1.9.46-0. After a mishap where I accidentaly caused a
failover of services with power fencing of server01, the system will not
rejoin the cluster after boot.

I have tried using both the init.d scripts and starting the daemons
manually to troubleshoot this further, to no avail. I'm able to start
ccsd properly (although it logs the cluster as inquorate) but it fails
completely on cman, claiming that connection is refused.

If anyone could help me by giving me some tips, directing me to the
proper documentation addressing this issue or downright pointing out my
problem, I would be most grateful.

[server01] # service ccsd start
Starting ccsd:                                             [  OK  ]
---8<--- /var/log/messages
Sep 14 00:33:28 server01 ccsd[30227]: Starting ccsd 1.0.3:
Sep 14 00:33:28 server01 ccsd[30227]:  Built: Jan 25 2006 16:54:43
Sep 14 00:33:28 server01 ccsd[30227]:  Copyright (C) Red Hat, Inc.  2004
 All rights reserved.
Sep 14 00:33:28 server01 ccsd[30227]: Connected to cluster infrastruture
via: CMAN/SM Plugin v1.1.5
Sep 14 00:33:28 server01 ccsd[30227]: Initial status:: Inquorate
Sep 14 00:33:29 server01 ccsd: startup succeeded
---8<---

[server01] # service cman start
Starting cman:                                             [FAILED]
---8<--- /var/log/messages
Sep 14 00:39:07 server01 ccsd[31417]: Cluster is not quorate.  Refusing
connection.
Sep 14 00:39:07 server01 ccsd[31417]: Error while processing connect:
Connection refused
Sep 14 00:39:07 server01 ccsd[31417]: cluster.conf (cluster name =
something_cluster, version = 46) found.
Sep 14 00:39:07 server01 ccsd[31417]: Remote copy of cluster.conf is
from quorate node.
Sep 14 00:39:07 server01 ccsd[31417]:  Local version # : 46
Sep 14 00:39:07 server01 ccsd[31417]:  Remote version #: 46
Sep 14 00:39:07 server01 cman: cman_tool: Node is already active failed
Sep 14 00:39:12 server01 kernel: CMAN: sending membership request
---8<---

[server01] # cat /proc/cluster/status
Protocol version: 5.0.1
Config version: 46
Cluster name: something_cluster
Cluster ID: 47540
Cluster Member: No
Membership state: Joining

[server01] # cat /proc/cluster/nodes
Node  Votes Exp Sts  Name

[server02] # cat /proc/cluster/status
Protocol version: 5.0.1
Config version: 46
Cluster name: something_cluster
Cluster ID: 47540
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 1
Total_votes: 1
Quorum: 1
Active subsystems: 4
Node name: server02
Node addresses: xx.xx.xx.134

[server02] # cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   1    1    1   X   server01
   2    1    1   M   server02

[server01] # cat /etc/cluster/cluster.conf
---8<---
<?xml version="1.0"?>
<cluster config_version="46" name="something_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="30"/>
        <clusternodes>
                <clusternode name="server01" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="APC-LEFT"
option="off" port="8" switch="0"/>
                                        <device name="APC-RIGHT"
option="off" port="8" switch="0"/>
                                        <device name="APC-LEFT"
option="on" port="8" switch="0"/>
                                        <device name="APC-RIGHT"
option="on" port="8" switch="0"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="server02" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="APC-LEFT"
option="off" port="4" switch="0"/>
                                        <device name="APC-RIGHT"
option="off" port="4" switch="0"/>
                                        <device name="APC-LEFT"
option="on" port="4" switch="0"/>
                                        <device name="APC-RIGHT"
option="on" port="4" switch="0"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="xx.xx.xx.10"
login="secret" name="APC-LEFT" passwd="secret"/>
                <fencedevice agent="fence_apc" ipaddr="xx.xx.xx.11"
login="secret" name="APC-RIGHT" passwd="secret"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="OX" ordered="1"
restricted="0">
                                <failoverdomainnode name="server01"
priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="IMAP" ordered="1"
restricted="0">
                                <failoverdomainnode name="server01"
priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="NFS" ordered="1"
restricted="0">
                                <failoverdomainnode name="server02"
priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="LDAP" ordered="1">
                                <failoverdomainnode name="server02"
priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="PGSQL" ordered="1"
restricted="0">
                                <failoverdomainnode name="server02"
priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources/>
                <service autostart="1" domain="PGSQL" name="OX-OX">
                        <script file="/etc/init.d/openexchange" name="OX"/>
                        <ip address="192.168.xx.xx" monitor_link="1"/>
                        <fs device="/dev/emcpowera9" force_fsck="0"
force_unmount="1" fsid="39155" fstype="ext3"
mountpoint="/var/opt/openexchange/filespool" name="OX" options=""
self_fence="0"/>
                        <script file="/etc/init.d/openexchange-daemons"
name="XMLRPC"/>
                        <script file="/etc/init.d/tomcat5" name="Tomcat"/>
                        <ip address="192.168.xx.xx" monitor_link="1"/>
                </service>
                <service autostart="1" domain="IMAP" name="OX-IMAP">
                        <ip address="192.168.xx.xx" monitor_link="1"/>
                        <fs device="/dev/emcpowera7" force_fsck="0"
force_unmount="1" fsid="63880" fstype="ext3" mountpoint="/var/lib/imap"
name="IMAP" options="" self_fence="0"/>
                        <fs device="/dev/emcpowera10" force_fsck="0"
force_unmount="1" fsid="63324" fstype="ext3"
mountpoint="/var/spool/imap1" name="IMAP1" options="" self_fence="0"/>
                        <script file="/etc/init.d/saslauthd" name="SASL"/>
                        <script file="/etc/init.d/cyrus-imapd"
name="Cyrus"/>
                        <fs device="/dev/emcpowerb5" force_fsck="0"
force_unmount="1" fsid="42726" fstype="ext3"
mountpoint="/var/spool/imap2" name="IMAP2" options="" self_fence="0"/>
                        <fs device="/dev/emcpowerb6" force_fsck="0"
force_unmount="1" fsid="38512" fstype="ext3"
mountpoint="/var/spool/imap3" name="IMAP3" options="" self_fence="0"/>
                        <fs device="/dev/emcpowerc5" force_fsck="0"
force_unmount="1" fsid="979" fstype="ext3" mountpoint="/var/spool/imap4"
name="IMAP4" options="" self_fence="0"/>
                        <fs device="/dev/emcpowerc6" force_fsck="0"
force_unmount="1" fsid="13125" fstype="ext3"
mountpoint="/var/spool/imap5" name="IMAP5" options="" self_fence="0"/>
                </service>
                <service autostart="1" domain="NFS" name="OX-NFS">
                        <ip address="192.168.xx.xx" monitor_link="1"/>
                        <fs device="/dev/emcpowera8" force_fsck="0"
force_unmount="1" fsid="37141" fstype="ext3"
mountpoint="/var/lib/xxxxxxxx" name="NFS" options="" self_fence="0"/>
                        <script file="/etc/init.d/nfs" name="NFS"/>
                        <script file="/etc/init.d/nfslock" name="NFSLOCK"/>
                </service>
                <service autostart="1" domain="LDAP" name="OX-LDAP">
                        <ip address="192.168.xx.xx" monitor_link="1"/>
                        <fs device="/dev/emcpowerb8" force_fsck="0"
force_unmount="1" fsid="12853" fstype="ext3"
mountpoint="/var/symas/openldap-data" name="DATA" options=""
self_fence="0"/>
                        <fs device="/dev/emcpowerb9" force_fsck="0"
force_unmount="1" fsid="11240" fstype="ext3"
mountpoint="/var/symas/openldap-logs" name="LOGS" options=""
self_fence="0"/>
                        <fs device="/dev/emcpowerb10" force_fsck="0"
force_unmount="1" fsid="10234" fstype="ext3"
mountpoint="/var/symas/openldap-slurp" name="SLURP" options=""
self_fence="0"/>
                        <script file="/etc/init.d/cdsserver" name="LDAP"/>
                </service>
                <service autostart="1" domain="PGSQL" name="OX-PGSQL">
                        <ip address="192.168.xx.xx" monitor_link="1"/>
                        <fs device="/dev/emcpowera5" force_fsck="0"
force_unmount="1" fsid="43285" fstype="ext3" mountpoint="/var/lib/pgsql"
name="PGSQL" options="" self_fence="0"/>
                        <script file="/etc/init.d/postgresql" name="PGSQL"/>
                </service>
        </rm>
</cluster>
---8<---

[server01] # cat /etc/hosts
---8<---
127.0.0.1       localhost.localdomain localhost
xx.xx.xx.133  server01.example.com     server01
xx.xx.xx.134  server02.example.com     server02
---8<---

Thanks,
.../Bosse

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]