[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] cluster latest cvs does not fence dead nodes automatically



David Teigland wrote:

Above the names were "hosting-cl02-01" and "hosting-cl02-02".  Could you
clear that up and if there are still problems send your cluster.conf file?
Thanks



Here's how it is now.
Using new hostnames and cluster.conf (blade center's IP address and community string removed):
==================================
<?xml version="1.0"?>
<cluster name="cluster" config_version="3">


<cman two_node="1" expected_votes="1">
</cman>

<clusternodes>
<clusternode name="cluster-node2" votes="1">
       <fence>
       <method name="single">
               <device name="ibmblade" port="7"/>
       </method>
       </fence>
</clusternode>
<clusternode name="cluster-node1" votes="1">
       <fence>
       <method name="single">
               <device name="ibmblade" port="6"/>
       </method>
       </fence>
</clusternode>


</clusternodes>


<fencedevices>
<fencedevice name="ibmblade" agent="fence_ibmblade" ipaddr="IP_ADDRESS_HERE" community="COMMUNITY_HERE"/>
</fencedevices>


</cluster>
===========================================

Commands and their output (console or syslog):

# modprobe gfs
# modprobe lock_dlm

Feb 15 15:10:04 cluster-node1 Lock_Harness <CVS> (built Feb 15 2005 12:00:38) installed
Feb 15 15:10:04 cluster-node1 GFS <CVS> (built Feb 15 2005 12:00:52) installed
Feb 15 15:10:08 cluster-node1 CMAN <CVS> (built Feb 15 2005 12:00:31) installed
Feb 15 15:10:08 cluster-node1 NET: Registered protocol family 30
Feb 15 15:10:08 cluster-node1 DLM <CVS> (built Feb 15 2005 12:00:34) installed
Feb 15 15:10:08 cluster-node1 Lock_DLM (built Feb 15 2005 12:00:39) installed


dm-mod is built-in in the kernel (not a module)

# ccsd -V
ccsd DEVEL.1108443619 (built Feb 15 2005 12:01:01)
Copyright (C) Red Hat, Inc.  2004  All rights reserved.

# ccsd -4
Feb 15 15:10:58 cluster-node1 ccsd[8556]: Starting ccsd DEVEL.1108443619:
Feb 15 15:10:58 cluster-node1 ccsd[8556]: Built: Feb 15 2005 12:01:01
Feb 15 15:10:58 cluster-node1 ccsd[8556]: Copyright (C) Red Hat, Inc. 2004 All rights reserved.
Feb 15 15:10:58 cluster-node1 ccsd[8556]: IP Protocol:: IPv4 only


# cman_tool join
Feb 15 15:12:27 cluster-node1 ccsd[8556]: cluster.conf (cluster name = cluster, version = 3) found.
Feb 15 15:12:28 cluster-node1 CMAN: Waiting to join or form a Linux-cluster
Feb 15 15:12:28 cluster-node1 ccsd[8558]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1
Feb 15 15:12:28 cluster-node1 ccsd[8558]: Initial status:: Inquorate
Feb 15 15:13:00 cluster-node1 CMAN: forming a new cluster
Feb 15 15:13:00 cluster-node1 CMAN: quorum regained, resuming activity
Feb 15 15:13:00 cluster-node1 ccsd[8558]: Cluster is quorate. Allowing connections.


# cman_tool status
Protocol version: 5.0.1
Config version: 3
Cluster name: cluster
Cluster ID: 13364
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 1
Total_votes: 1
Quorum: 1
Active subsystems: 0
Node name: cluster-node1
Node addresses: 192.168.192.146

# cman_tool nodes
Node  Votes Exp Sts  Name
  1    1    1   M   cluster-node1

# fence_tool join
Feb 15 15:14:26 cluster-node1 fenced[8847]: cluster-node2 not a cluster member after 6 sec post_join_delay
Feb 15 15:14:26 cluster-node1 fenced[8847]: fencing node "cluster-node2"
Feb 15 15:14:32 cluster-node1 fenced[8847]: fence "cluster-node2" success


at this point "cluster-node2" was fenced and automatically rebooted, which is good.

Now I join the cluster-node2 to the cluster :
# modprobe gfs
# modprobe lock_dlm
# cman_tool join
# fence_tool join

Feb 15 15:18:30 cluster-node2 ccsd[8376]: Starting ccsd DEVEL.1108443619:
Feb 15 15:18:30 cluster-node2 ccsd[8376]: Built: Feb 15 2005 12:01:01
Feb 15 15:18:30 cluster-node2 ccsd[8376]: Copyright (C) Red Hat, Inc. 2004 All rights reserved.
Feb 15 15:18:30 cluster-node2 ccsd[8376]: IP Protocol:: IPv4 only
Feb 15 15:18:34 cluster-node2 ccsd[8376]: cluster.conf (cluster name = cluster, version = 3) found.
Feb 15 15:18:34 cluster-node2 ccsd[8376]: Remote copy of cluster.conf is from quorate node.
Feb 15 15:18:34 cluster-node2 ccsd[8376]: Local version # : 3
Feb 15 15:18:34 cluster-node2 ccsd[8376]: Remote version #: 3
Feb 15 15:18:41 cluster-node2 Lock_Harness <CVS> (built Feb 15 2005 12:00:38) installed
Feb 15 15:18:41 cluster-node2 GFS <CVS> (built Feb 15 2005 12:00:52) installed
Feb 15 15:18:44 cluster-node2 CMAN <CVS> (built Feb 15 2005 12:00:31) installed
Feb 15 15:18:44 cluster-node2 NET: Registered protocol family 30
Feb 15 15:18:44 cluster-node2 DLM <CVS> (built Feb 15 2005 12:00:34) installed
Feb 15 15:18:44 cluster-node2 Lock_DLM (built Feb 15 2005 12:00:39) installed
Feb 15 15:18:47 cluster-node2 ccsd[8376]: Remote copy of cluster.conf is from quorate node.
Feb 15 15:18:47 cluster-node2 ccsd[8376]: Local version # : 3
Feb 15 15:18:47 cluster-node2 ccsd[8376]: Remote version #: 3
Feb 15 15:18:47 cluster-node2 CMAN: Waiting to join or form a Linux-cluster
Feb 15 15:18:48 cluster-node2 ccsd[8378]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1
Feb 15 15:18:48 cluster-node2 ccsd[8378]: Initial status:: Inquorate
Feb 15 15:18:50 cluster-node2 CMAN: sending membership request
Feb 15 15:18:50 cluster-node2 CMAN: got node cluster-node1
Feb 15 15:18:50 cluster-node2 CMAN: quorum regained, resuming activity
Feb 15 15:18:50 cluster-node2 ccsd[8378]: Cluster is quorate. Allowing connections.


on node 1 :
# clvmd
Feb 15 15:24:56 cluster-node1 CMAN: WARNING no listener for port 11 on node cluster-node2


on node 2 :
# clvmd
Feb 15 15:25:03 cluster-node2 clvmd: Cluster LVM daemon started - connected to CMAN


on node 1 :
# cman_tool nodes
Node  Votes Exp Sts  Name
  1    1    1   M   cluster-node1
  2    1    1   M   cluster-node2

# cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 2]

DLM Lock Space:  "clvmd"                             3   3 run       -
[1 2]

# cman_tool status
Protocol version: 5.0.1
Config version: 3
Cluster name: cluster
Cluster ID: 13364
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 3
Node name: cluster-node1
Node addresses: 192.168.192.146

Now I shutdown node2's network interface.

On node 2 :
# ifconfig eth0 down

On node 1 :
Feb 15 15:29:50 cluster-node1 CMAN: removing node cluster-node2 from the cluster : Missed too many heartbeats


# cman_tool status
Protocol version: 5.0.1
Config version: 3
Cluster name: cluster
Cluster ID: 13364
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 3
Node name: cluster-node1
Node addresses: 192.168.192.146

# cman_tool status
Protocol version: 5.0.1
Config version: 3
Cluster name: cluster
Cluster ID: 13364
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 1
Total_votes: 1
Quorum: 1
Active subsystems: 3
Node name: cluster-node1
Node addresses: 192.168.192.146

# cman_tool nodes
Node  Votes Exp Sts  Name
  1    1    1   M   cluster-node1
  2    1    1   X   cluster-node2

# cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 2]

DLM Lock Space:  "clvmd"                             3   3 run       -
[1 2]

No note about fencing whatsoever, and node 2 is not automatically rebooted.
Shouldn't node 2 get fenced here?

Regards,

Fajar


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]