[Linux-cluster] Adding node to clvm cluster

Bjoern Teipel bjoern.teipel at internetbrands.com
Wed Jan 8 05:57:32 UTC 2014


I'm trying to join a new node into an existing 5 node CLVM cluster but I
just can't get it to work.

When ever I add a new node (I put into the cluster.conf and reloaded with
cman_tool version -r -S)  I end up with situations like the new node wants
to gain the quorum and starts to fence the existing pool master and appears
to generate some sort of split cluster. Does it work at all, corosync and
dlm do not know about the recently added node ?

New Node
==========

Node  Sts   Inc   Joined               Name
   1   X      0                        hv-b1clcy1
   2   X      0                        hv-b1flcy1
   3   X      0                        hv-b1fmcy1
   4   X      0                        hv-b1dmcy1
   5   X      0                        hv-b1fkcy1
   6   M     80   2014-01-07 21:37:42  hv-b1dkcy1 <--- host added


Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] The network
interface [10.14.18.77] is now up.
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [QUORUM] Using quorum
provider quorum_cman
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine
loaded: corosync cluster quorum service v0.1
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [CMAN  ] CMAN 3.0.12.1 (built
Sep  3 2013 09:17:34) started
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine
loaded: corosync CMAN membership service 2.90
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine
loaded: openais checkpoint service B.01.01
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine
loaded: corosync extended virtual synchrony service
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine
loaded: corosync configuration service
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine
loaded: corosync cluster closed process group service v1.01
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine
loaded: corosync cluster config database access v1.01
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine
loaded: corosync profile loading service
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [QUORUM] Using quorum
provider quorum_cman
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [SERV  ] Service engine
loaded: corosync cluster quorum service v0.1
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [MAIN  ] Compatibility mode
set to whitetank.  Using V1 and V2 of the synchronization engine.
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] adding new UDPU
member {10.14.18.65}
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] adding new UDPU
member {10.14.18.67}
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] adding new UDPU
member {10.14.18.68}
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] adding new UDPU
member {10.14.18.70}
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] adding new UDPU
member {10.14.18.66}
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] adding new UDPU
member {10.14.18.77}
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [TOTEM ] A processor joined
or left the membership and a new membership was formed.
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [CMAN  ] quorum regained,
resuming activity
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [QUORUM] This node is within
the primary component and will provide service.
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [QUORUM] Members[1]: 6
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [QUORUM] Members[1]: 6
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [CPG   ] chosen downlist:
sender r(0) ip(10.14.18.77) ; members(old:0 left:0)
Jan  7 21:37:42 hv-b1dkcy1 corosync[12564]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Jan  7 21:37:46 hv-b1dkcy1 fenced[12620]: fenced 3.0.12.1 started
Jan  7 21:37:46 hv-b1dkcy1 dlm_controld[12643]: dlm_controld 3.0.12.1
started
Jan  7 21:37:47 hv-b1dkcy1 gfs_controld[12695]: gfs_controld 3.0.12.1
started
Jan  7 21:37:54 hv-b1dkcy1 fenced[12620]: fencing node hv-b1clcy1

sudo -i corosync-objctl  |grep member

totem.interface.member.memberaddr=hv-b1clcy1
totem.interface.member.memberaddr=hv-b1fmcy1
totem.interface.member.memberaddr=hv-b1dmcy1
totem.interface.member.memberaddr=hv-b1fkcy1
totem.interface.member.memberaddr=hv-b1flcy1
totem.interface.member.memberaddr=hv-b1dkcy1
runtime.totem.pg.mrp.srp.members.6.ip=r(0) ip(10.14.18.77)
runtime.totem.pg.mrp.srp.members.6.join_count=1
runtime.totem.pg.mrp.srp.members.6.status=joined


Existing Node
=============

member 6 has not been added to the quorum list :

Jan  7 21:36:28 hv-b1clcy1 corosync[7769]:   [QUORUM] Members[4]: 1 2 3 5
Jan  7 21:37:54 hv-b1clcy1 corosync[7769]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jan  7 21:37:54 hv-b1clcy1 corosync[7769]:   [CPG   ] chosen downlist:
sender r(0) ip(10.14.18.65) ; members(old:4 left:0)


Node  Sts   Inc   Joined               Name
   1   M   4468   2013-12-10 14:33:27  hv-b1clcy1
   2   M   4468   2013-12-10 14:33:27  hv-b1flcy1
   3   M   5036   2014-01-07 17:51:26  hv-b1fmcy1
   4   X   4468                        hv-b1dmcy1 (dead at the moment)
   5   M   4468   2013-12-10 14:33:27  hv-b1fkcy1
   6   X      0                        hv-b1dkcy1  <--- added


Jan  7 21:36:28 hv-b1clcy1 corosync[7769]:   [QUORUM] Members[4]: 1 2 3 5
Jan  7 21:37:54 hv-b1clcy1 corosync[7769]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jan  7 21:37:54 hv-b1clcy1 corosync[7769]:   [CPG   ] chosen downlist:
sender r(0) ip(10.14.18.65) ; members(old:4 left:0)
Jan  7 21:37:54 hv-b1clcy1 corosync[7769]:   [MAIN  ] Completed service
synchronization, ready to provide service.


totem.interface.member.memberaddr=hv-b1clcy1
totem.interface.member.memberaddr=hv-b1fmcy1
totem.interface.member.memberaddr=hv-b1dmcy1
totem.interface.member.memberaddr=hv-b1fkcy1
totem.interface.member.memberaddr=hv-b1flcy1.
runtime.totem.pg.mrp.srp.members.1.ip=r(0) ip(10.14.18.65)
runtime.totem.pg.mrp.srp.members.1.join_count=1
runtime.totem.pg.mrp.srp.members.1.status=joined
runtime.totem.pg.mrp.srp.members.2.ip=r(0) ip(10.14.18.66)
runtime.totem.pg.mrp.srp.members.2.join_count=1
runtime.totem.pg.mrp.srp.members.2.status=joined
runtime.totem.pg.mrp.srp.members.4.ip=r(0) ip(10.14.18.68)
runtime.totem.pg.mrp.srp.members.4.join_count=1
runtime.totem.pg.mrp.srp.members.4.status=left
runtime.totem.pg.mrp.srp.members.5.ip=r(0) ip(10.14.18.70)
runtime.totem.pg.mrp.srp.members.5.join_count=1
runtime.totem.pg.mrp.srp.members.5.status=joined
runtime.totem.pg.mrp.srp.members.3.ip=r(0) ip(10.14.18.67)
runtime.totem.pg.mrp.srp.members.3.join_count=3
runtime.totem.pg.mrp.srp.members.3.status=joined


cluster.conf:

<?xml version="1.0"?>
<cluster config_version="32" name="hv-1618-110-1">
  <fence_daemon clean_start="0"/>
  <cman transport="udpu" expected_votes="1"/>
  <logging debug="off"/>
  <clusternodes>
    <clusternode name="hv-b1clcy1" votes="1" nodeid="1"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
    <clusternode name="hv-b1fmcy1" votes="1" nodeid="3"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
    <clusternode name="hv-b1dmcy1" votes="1" nodeid="4"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
    <clusternode name="hv-b1fkcy1" votes="1" nodeid="5"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
    <clusternode name="hv-b1flcy1" votes="1" nodeid="2"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
    <clusternode name="hv-b1dkcy1" votes="1" nodeid="6"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
  </clusternodes>
  <fencedevices>
  <fencedevice name="human" agent="manual"/></fencedevices>
  <rm/>
</cluster>

(manual fencing just for testing)


corosync.conf:

compatibility: whitetank
totem {
  version: 2
  secauth: off
  threads: 0
  # fail_recv_const: 5000
  interface {
    ringnumber: 0
    bindnetaddr: 10.14.18.0
    mcastaddr: 239.0.0.4
    mcastport: 5405
  }
}
logging {
  fileline: off
  to_stderr: no
  to_logfile: yes
  to_syslog: yes
  # the pathname of the log file
  logfile: /var/log/cluster/corosync.log
  debug: off
  timestamp: on
  logger_subsys {
    subsys: AMF
    debug: off
  }
}

amf {
  mode: disabled
}


Many thanks,
Bjoern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140107/57cbfbd7/attachment.htm>


More information about the Linux-cluster mailing list