[Linux-cluster] Adding node to clvm cluster
Bjoern Teipel
bjoern.teipel at internetbrands.com
Wed Jan 8 05:57:32 UTC 2014
I'm trying to join a new node into an existing 5 node CLVM cluster but I
just can't get it to work.
When ever I add a new node (I put into the cluster.conf and reloaded with
cman_tool version -r -S) I end up with situations like the new node wants
to gain the quorum and starts to fence the existing pool master and appears
to generate some sort of split cluster. Does it work at all, corosync and
dlm do not know about the recently added node ?
New Node
==========
Node Sts Inc Joined Name
1 X 0 hv-b1clcy1
2 X 0 hv-b1flcy1
3 X 0 hv-b1fmcy1
4 X 0 hv-b1dmcy1
5 X 0 hv-b1fkcy1
6 M 80 2014-01-07 21:37:42 hv-b1dkcy1 <--- host added
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [TOTEM ] The network
interface [10.14.18.77] is now up.
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [QUORUM] Using quorum
provider quorum_cman
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [SERV ] Service engine
loaded: corosync cluster quorum service v0.1
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [CMAN ] CMAN 3.0.12.1 (built
Sep 3 2013 09:17:34) started
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [SERV ] Service engine
loaded: corosync CMAN membership service 2.90
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [SERV ] Service engine
loaded: openais checkpoint service B.01.01
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [SERV ] Service engine
loaded: corosync extended virtual synchrony service
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [SERV ] Service engine
loaded: corosync configuration service
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [SERV ] Service engine
loaded: corosync cluster closed process group service v1.01
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [SERV ] Service engine
loaded: corosync cluster config database access v1.01
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [SERV ] Service engine
loaded: corosync profile loading service
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [QUORUM] Using quorum
provider quorum_cman
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [SERV ] Service engine
loaded: corosync cluster quorum service v0.1
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [MAIN ] Compatibility mode
set to whitetank. Using V1 and V2 of the synchronization engine.
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [TOTEM ] adding new UDPU
member {10.14.18.65}
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [TOTEM ] adding new UDPU
member {10.14.18.67}
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [TOTEM ] adding new UDPU
member {10.14.18.68}
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [TOTEM ] adding new UDPU
member {10.14.18.70}
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [TOTEM ] adding new UDPU
member {10.14.18.66}
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [TOTEM ] adding new UDPU
member {10.14.18.77}
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [TOTEM ] A processor joined
or left the membership and a new membership was formed.
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [CMAN ] quorum regained,
resuming activity
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [QUORUM] This node is within
the primary component and will provide service.
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [QUORUM] Members[1]: 6
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [QUORUM] Members[1]: 6
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [CPG ] chosen downlist:
sender r(0) ip(10.14.18.77) ; members(old:0 left:0)
Jan 7 21:37:42 hv-b1dkcy1 corosync[12564]: [MAIN ] Completed service
synchronization, ready to provide service.
Jan 7 21:37:46 hv-b1dkcy1 fenced[12620]: fenced 3.0.12.1 started
Jan 7 21:37:46 hv-b1dkcy1 dlm_controld[12643]: dlm_controld 3.0.12.1
started
Jan 7 21:37:47 hv-b1dkcy1 gfs_controld[12695]: gfs_controld 3.0.12.1
started
Jan 7 21:37:54 hv-b1dkcy1 fenced[12620]: fencing node hv-b1clcy1
sudo -i corosync-objctl |grep member
totem.interface.member.memberaddr=hv-b1clcy1
totem.interface.member.memberaddr=hv-b1fmcy1
totem.interface.member.memberaddr=hv-b1dmcy1
totem.interface.member.memberaddr=hv-b1fkcy1
totem.interface.member.memberaddr=hv-b1flcy1
totem.interface.member.memberaddr=hv-b1dkcy1
runtime.totem.pg.mrp.srp.members.6.ip=r(0) ip(10.14.18.77)
runtime.totem.pg.mrp.srp.members.6.join_count=1
runtime.totem.pg.mrp.srp.members.6.status=joined
Existing Node
=============
member 6 has not been added to the quorum list :
Jan 7 21:36:28 hv-b1clcy1 corosync[7769]: [QUORUM] Members[4]: 1 2 3 5
Jan 7 21:37:54 hv-b1clcy1 corosync[7769]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jan 7 21:37:54 hv-b1clcy1 corosync[7769]: [CPG ] chosen downlist:
sender r(0) ip(10.14.18.65) ; members(old:4 left:0)
Node Sts Inc Joined Name
1 M 4468 2013-12-10 14:33:27 hv-b1clcy1
2 M 4468 2013-12-10 14:33:27 hv-b1flcy1
3 M 5036 2014-01-07 17:51:26 hv-b1fmcy1
4 X 4468 hv-b1dmcy1 (dead at the moment)
5 M 4468 2013-12-10 14:33:27 hv-b1fkcy1
6 X 0 hv-b1dkcy1 <--- added
Jan 7 21:36:28 hv-b1clcy1 corosync[7769]: [QUORUM] Members[4]: 1 2 3 5
Jan 7 21:37:54 hv-b1clcy1 corosync[7769]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jan 7 21:37:54 hv-b1clcy1 corosync[7769]: [CPG ] chosen downlist:
sender r(0) ip(10.14.18.65) ; members(old:4 left:0)
Jan 7 21:37:54 hv-b1clcy1 corosync[7769]: [MAIN ] Completed service
synchronization, ready to provide service.
totem.interface.member.memberaddr=hv-b1clcy1
totem.interface.member.memberaddr=hv-b1fmcy1
totem.interface.member.memberaddr=hv-b1dmcy1
totem.interface.member.memberaddr=hv-b1fkcy1
totem.interface.member.memberaddr=hv-b1flcy1.
runtime.totem.pg.mrp.srp.members.1.ip=r(0) ip(10.14.18.65)
runtime.totem.pg.mrp.srp.members.1.join_count=1
runtime.totem.pg.mrp.srp.members.1.status=joined
runtime.totem.pg.mrp.srp.members.2.ip=r(0) ip(10.14.18.66)
runtime.totem.pg.mrp.srp.members.2.join_count=1
runtime.totem.pg.mrp.srp.members.2.status=joined
runtime.totem.pg.mrp.srp.members.4.ip=r(0) ip(10.14.18.68)
runtime.totem.pg.mrp.srp.members.4.join_count=1
runtime.totem.pg.mrp.srp.members.4.status=left
runtime.totem.pg.mrp.srp.members.5.ip=r(0) ip(10.14.18.70)
runtime.totem.pg.mrp.srp.members.5.join_count=1
runtime.totem.pg.mrp.srp.members.5.status=joined
runtime.totem.pg.mrp.srp.members.3.ip=r(0) ip(10.14.18.67)
runtime.totem.pg.mrp.srp.members.3.join_count=3
runtime.totem.pg.mrp.srp.members.3.status=joined
cluster.conf:
<?xml version="1.0"?>
<cluster config_version="32" name="hv-1618-110-1">
<fence_daemon clean_start="0"/>
<cman transport="udpu" expected_votes="1"/>
<logging debug="off"/>
<clusternodes>
<clusternode name="hv-b1clcy1" votes="1" nodeid="1"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
<clusternode name="hv-b1fmcy1" votes="1" nodeid="3"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
<clusternode name="hv-b1dmcy1" votes="1" nodeid="4"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
<clusternode name="hv-b1fkcy1" votes="1" nodeid="5"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
<clusternode name="hv-b1flcy1" votes="1" nodeid="2"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
<clusternode name="hv-b1dkcy1" votes="1" nodeid="6"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="human" agent="manual"/></fencedevices>
<rm/>
</cluster>
(manual fencing just for testing)
corosync.conf:
compatibility: whitetank
totem {
version: 2
secauth: off
threads: 0
# fail_recv_const: 5000
interface {
ringnumber: 0
bindnetaddr: 10.14.18.0
mcastaddr: 239.0.0.4
mcastport: 5405
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
to_syslog: yes
# the pathname of the log file
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
Many thanks,
Bjoern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140107/57cbfbd7/attachment.htm>
More information about the Linux-cluster
mailing list