[Linux-cluster] CLVM & CMAN live adding nodes

Bjoern Teipel bjoern.teipel at internetbrands.com
Sat Feb 22 19:05:42 UTC 2014


Thanks Fabio for replying may request.

I'm using stock CentOS 6.4 versions and no rm, just clvmd and dlm.

Name        : cman                         Relocations: (not relocatable)
Version     : 3.0.12.1                          Vendor: CentOS
Release     : 49.el6_4.2                    Build Date: Tue 03 Sep 2013
02:18:10 AM PDT

Name        : lvm2-cluster                 Relocations: (not relocatable)
Version     : 2.02.98                           Vendor: CentOS
Release     : 9.el6_4.3                     Build Date: Tue 05 Nov 2013
07:36:18 AM PST

Name        : corosync                     Relocations: (not relocatable)
Version     : 1.4.1                             Vendor: CentOS
Release     : 15.el6_4.1                    Build Date: Tue 14 May 2013
02:09:27 PM PDT


My question is based off this problem I have till January:


When ever I add a new node (I put into the cluster.conf and reloaded with
cman_tool version -r -S)  I end up with situations like the new node wants
to gain the quorum and starts to fence the existing pool master and appears
to generate some sort of split cluster. Does it work at all, corosync and
dlm do not know about the recently added node ?

New Node
==========

Node  Sts   Inc   Joined               Name
   1   X      0                        hv-1
   2   X      0                        hv-2
   3   X      0                        hv-3
   4   X      0                        hv-4
   5   X      0                        hv-5
   6   M     80   2014-01-07 21:37:42  hv-6<--- host added


Jan  7 21:37:42 hv-1 corosync[12564]:   [TOTEM ] The network interface
[10.14.18.77] is now up.
Jan  7 21:37:42 hv-1 corosync[12564]:   [QUORUM] Using quorum provider
quorum_cman
Jan  7 21:37:42 hv-1 corosync[12564]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Jan  7 21:37:42 hv-1 corosync[12564]:   [CMAN  ] CMAN 3.0.12.1 (built Sep
 3 2013 09:17:34) started
Jan  7 21:37:42 hv-1 corosync[12564]:   [SERV  ] Service engine loaded:
corosync CMAN membership service 2.90
Jan  7 21:37:42 hv-1 corosync[12564]:   [SERV  ] Service engine loaded:
openais checkpoint service B.01.01
Jan  7 21:37:42 hv-1 corosync[12564]:   [SERV  ] Service engine loaded:
corosync extended virtual synchrony service
Jan  7 21:37:42 hv-1 corosync[12564]:   [SERV  ] Service engine loaded:
corosync configuration service
Jan  7 21:37:42 hv-1 corosync[12564]:   [SERV  ] Service engine loaded:
corosync cluster closed process group service v1.01
Jan  7 21:37:42 hv-1 corosync[12564]:   [SERV  ] Service engine loaded:
corosync cluster config database access v1.01
Jan  7 21:37:42 hv-1 corosync[12564]:   [SERV  ] Service engine loaded:
corosync profile loading service
Jan  7 21:37:42 hv-1 corosync[12564]:   [QUORUM] Using quorum provider
quorum_cman
Jan  7 21:37:42 hv-1 corosync[12564]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Jan  7 21:37:42 hv-1 corosync[12564]:   [MAIN  ] Compatibility mode set to
whitetank.  Using V1 and V2 of the synchronization engine.
Jan  7 21:37:42 hv-1 corosync[12564]:   [TOTEM ] adding new UDPU member
{10.14.18.65}
Jan  7 21:37:42 hv-1 corosync[12564]:   [TOTEM ] adding new UDPU member
{10.14.18.67}
Jan  7 21:37:42 hv-1 corosync[12564]:   [TOTEM ] adding new UDPU member
{10.14.18.68}
Jan  7 21:37:42 hv-1 corosync[12564]:   [TOTEM ] adding new UDPU member
{10.14.18.70}
Jan  7 21:37:42 hv-1 corosync[12564]:   [TOTEM ] adding new UDPU member
{10.14.18.66}
Jan  7 21:37:42 hv-1 corosync[12564]:   [TOTEM ] adding new UDPU member
{10.14.18.77}
Jan  7 21:37:42 hv-1  corosync[12564]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jan  7 21:37:42 hv-1 corosync[12564]:   [CMAN  ] quorum regained, resuming
activity
Jan  7 21:37:42 hv-1 corosync[12564]:   [QUORUM] This node is within the
primary component and will provide service.
Jan  7 21:37:42 hv-1 corosync[12564]:   [QUORUM] Members[1]: 6
Jan  7 21:37:42 hv-1 corosync[12564]:   [QUORUM] Members[1]: 6
Jan  7 21:37:42 hv-1 corosync[12564]:   [CPG   ] chosen downlist: sender
r(0) ip(10.14.18.77) ; members(old:0 left:0)
Jan  7 21:37:42 hv-1 corosync[12564]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Jan  7 21:37:46 hv-1 fenced[12620]: fenced 3.0.12.1 started
Jan  7 21:37:46 hv-1 dlm_controld[12643]: dlm_controld 3.0.12.1 started
Jan  7 21:37:47 hv-1 gfs_controld[12695]: gfs_controld 3.0.12.1 started
Jan  7 21:37:54 hv-1 fenced[12620]: fencing node hv-b1clcy1

sudo -i corosync-objctl  |grep member

totem.interface.member.memberaddr=hv-1
totem.interface.member.memberaddr=hv-2
totem.interface.member.memberaddr=hv-3
totem.interface.member.memberaddr=hv-4
totem.interface.member.memberaddr=hv-5
totem.interface.member.memberaddr=hv-6
runtime.totem.pg.mrp.srp.members.6.ip=r(0) ip(10.14.18.77)
runtime.totem.pg.mrp.srp.members.6.join_count=1
runtime.totem.pg.mrp.srp.members.6.status=joined


Existing Node
=============

member 6 has not been added to the quorum list :

Jan  7 21:36:28 hv-1 corosync[7769]:   [QUORUM] Members[4]: 1 2 3 5
Jan  7 21:37:54 hv-1 corosync[7769]:   [TOTEM ] A processor joined or left
the membership and a new membership was formed.
Jan  7 21:37:54 hv-1 corosync[7769]:   [CPG   ] chosen downlist: sender
r(0) ip(10.14.18.65) ; members(old:4 left:0)


Node  Sts   Inc   Joined               Name
   1   M   4468   2013-12-10 14:33:27  hv-1
   2   M   4468   2013-12-10 14:33:27  hv-2
   3   M   5036   2014-01-07 17:51:26  hv-3
   4   X   4468                        hv-4(dead at the moment)
   5   M   4468   2013-12-10 14:33:27  hv-5
   6   X      0                        hv-6<--- added


Jan  7 21:36:28 hv-1 corosync[7769]:   [QUORUM] Members[4]: 1 2 3 5
Jan  7 21:37:54 hv-1 corosync[7769]:   [TOTEM ] A processor joined or left
the membership and a new membership was formed.
Jan  7 21:37:54 hv-1 corosync[7769]:   [CPG   ] chosen downlist: sender
r(0) ip(10.14.18.65) ; members(old:4 left:0)
Jan  7 21:37:54 hv-1 corosync[7769]:   [MAIN  ] Completed service
synchronization, ready to provide service.


totem.interface.member.memberaddr=hv-1
totem.interface.member.memberaddr=hv-2
totem.interface.member.memberaddr=hv-3
totem.interface.member.memberaddr=hv-4
totem.interface.member.memberaddr=hv-5.
runtime.totem.pg.mrp.srp.members.1.ip=r(0) ip(10.14.18.65)
runtime.totem.pg.mrp.srp.members.1.join_count=1
runtime.totem.pg.mrp.srp.members.1.status=joined
runtime.totem.pg.mrp.srp.members.2.ip=r(0) ip(10.14.18.66)
runtime.totem.pg.mrp.srp.members.2.join_count=1
runtime.totem.pg.mrp.srp.members.2.status=joined
runtime.totem.pg.mrp.srp.members.4.ip=r(0) ip(10.14.18.68)
runtime.totem.pg.mrp.srp.members.4.join_count=1
runtime.totem.pg.mrp.srp.members.4.status=left
runtime.totem.pg.mrp.srp.members.5.ip=r(0) ip(10.14.18.70)
runtime.totem.pg.mrp.srp.members.5.join_count=1
runtime.totem.pg.mrp.srp.members.5.status=joined
runtime.totem.pg.mrp.srp.members.3.ip=r(0) ip(10.14.18.67)
runtime.totem.pg.mrp.srp.members.3.join_count=3
runtime.totem.pg.mrp.srp.members.3.status=joined


cluster.conf:

<?xml version="1.0"?>
<cluster config_version="32" name="hv-1618-110-1">
  <fence_daemon clean_start="0"/>
  <cman transport="udpu" expected_votes="1"/>
  <logging debug="off"/>
  <clusternodes>
    <clusternode name="hv-1" votes="1" nodeid="1"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
    <clusternode name="hv-2" votes="1" nodeid="3"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
    <clusternode name="hv-3" votes="1" nodeid="4"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
    <clusternode name="hv-4" votes="1" nodeid="5"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
    <clusternode name="hv-5" votes="1" nodeid="2"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
    <clusternode name="hv-6" votes="1" nodeid="6"><fence><method
name="single"><device name="human"/></method></fence></clusternode>
  </clusternodes>
  <fencedevices>
  <fencedevice name="human" agent="manual"/></fencedevices>
  <rm/>
</cluster>

(manual fencing just for testing)


corosync.conf:

compatibility: whitetank
totem {
  version: 2
  secauth: off
  threads: 0
  # fail_recv_const: 5000
  interface {
    ringnumber: 0
    bindnetaddr: 10.14.18.0
    mcastaddr: 239.0.0.4
    mcastport: 5405
  }
}
logging {
  fileline: off
  to_stderr: no
  to_logfile: yes
  to_syslog: yes
  # the pathname of the log file
  logfile: /var/log/cluster/corosync.log
  debug: off
  timestamp: on
  logger_subsys {
    subsys: AMF
    debug: off
  }
}

amf {
  mode: disabled
}



On Sat, Feb 22, 2014 at 5:54 AM, Fabio M. Di Nitto <fdinitto at redhat.com>wrote:

> On 02/22/2014 10:33 AM, emmanuel segura wrote:
> > I know if you need to modify anything outside <rm>... </rm>{used by
> > rgmanager} tag in the cluster.conf file, you need to restart the whole
> > cluster stack, with cman+rgmanager i have never seen how to add a node
> > and remove a node from cluster without restart cman.
>
> It depends on the version. RHEL5 that's correct, RHEL6 it works also for
> outside of <rm> but there are some limitations as some parameters just
> can't be changed runtime.
>
> Fabio
>
> >
> >
> >
> >
> > 2014-02-22 6:21 GMT+01:00 Bjoern Teipel
> > <bjoern.teipel at internetbrands.com
> > <mailto:bjoern.teipel at internetbrands.com>>:
> >
> >     Hi all,
> >
> >     who's using CLVM with CMAN in a cluster with more than 2 nodes in
> >     production ?
> >     Did you guys got it to manage to live add a new node to the cluster
> >     while everything is running ?
> >     I'm only able to add nodes while the cluster stack is shutdown.
> >     That's certainly not a good idea when you have to run CLVM on
> >     hypervisors and you need to shutdown all VMs to add a new box.
> >     Would be also good if you paste some of your configs using IPMI
> fencing
> >
> >     Thanks in advance,
> >     Bjoern
> >
> >     --
> >     Linux-cluster mailing list
> >     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
> >     https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> >
> > --
> > esta es mi vida e me la vivo hasta que dios quiera
> >
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140222/ee1e624d/attachment.htm>


More information about the Linux-cluster mailing list