[Linux-cluster] rhel6 node start causes power on of the other one

Gianluca Cecchi gianluca.cecchi at gmail.com
Tue Mar 22 10:12:11 UTC 2011


Hello,
I'm using latest updates on a 2 nodes rhel 6 based cluster.
At the moment no quorum disk defined, so this line inside cluster.conf
<cman expected_votes="1" two_node="1"/>

# rpm -q cman rgmanager fence-agents ricci corosync
cman-3.0.12-23.el6_0.6.x86_64
rgmanager-3.0.12-10.el6.x86_64
fence-agents-3.0.12-8.el6_0.3.x86_64
ricci-0.16.2-13.el6.x86_64
corosync-1.2.3-21.el6_0.1.x86_64

# uname -r
2.6.32-71.18.2.el6.x86_64

If the initial situation is both nodes down and I start one of them, I
get it powering on the other, that is not my intentional target...
Is this an expected default behaviour in rh el 6 with two nodes
without quorum disk? Or in general no matter if a quorum disk is
defined?
If so, how to change it if possible?

below output of this command on the first node when booting and
starting the other one:
# egrep "ricci|rgmanager|dlm|gfs|cman|corosync|fence" /var/log/messages
Mar 22 10:56:53 rhev2 corosync[6747]:   [MAIN  ] Corosync Cluster
Engine ('1.2.3'): started and ready to provide service.
Mar 22 10:56:53 rhev2 corosync[6747]:   [MAIN  ] Corosync built-in
features: nss rdma
Mar 22 10:56:53 rhev2 corosync[6747]:   [MAIN  ] Successfully read
config from /etc/cluster/cluster.conf
Mar 22 10:56:53 rhev2 corosync[6747]:   [MAIN  ] Successfully parsed cman config
Mar 22 10:56:53 rhev2 corosync[6747]:   [TOTEM ] Initializing
transport (UDP/IP).
Mar 22 10:56:53 rhev2 corosync[6747]:   [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Mar 22 10:56:53 rhev2 corosync[6747]:   [TOTEM ] The network interface
[192.168.16.32] is now up.
Mar 22 10:56:54 rhev2 corosync[6747]:   [QUORUM] Using quorum provider
quorum_cman
Mar 22 10:56:54 rhev2 corosync[6747]:   [SERV  ] Service engine
loaded: corosync cluster quorum service v0.1
Mar 22 10:56:54 rhev2 corosync[6747]:   [CMAN  ] CMAN 3.0.12 (built
Dec  1 2010 13:41:12) started
Mar 22 10:56:54 rhev2 corosync[6747]:   [SERV  ] Service engine
loaded: corosync CMAN membership service 2.90
Mar 22 10:56:54 rhev2 corosync[6747]:   [SERV  ] Service engine
loaded: openais checkpoint service B.01.01
Mar 22 10:56:54 rhev2 corosync[6747]:   [SERV  ] Service engine
loaded: corosync extended virtual synchrony service
Mar 22 10:56:54 rhev2 corosync[6747]:   [SERV  ] Service engine
loaded: corosync configuration service
Mar 22 10:56:54 rhev2 corosync[6747]:   [SERV  ] Service engine
loaded: corosync cluster closed process group service v1.01
Mar 22 10:56:54 rhev2 corosync[6747]:   [SERV  ] Service engine
loaded: corosync cluster config database access v1.01
Mar 22 10:56:54 rhev2 corosync[6747]:   [SERV  ] Service engine
loaded: corosync profile loading service
Mar 22 10:56:54 rhev2 corosync[6747]:   [QUORUM] Using quorum provider
quorum_cman
Mar 22 10:56:54 rhev2 corosync[6747]:   [SERV  ] Service engine
loaded: corosync cluster quorum service v0.1
Mar 22 10:56:54 rhev2 corosync[6747]:   [MAIN  ] Compatibility mode
set to whitetank.  Using V1 and V2 of the synchronization engine.
Mar 22 10:56:54 rhev2 corosync[6747]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Mar 22 10:56:54 rhev2 corosync[6747]:   [CMAN  ] quorum regained,
resuming activity
Mar 22 10:56:54 rhev2 corosync[6747]:   [QUORUM] This node is within
the primary component and will provide service.
Mar 22 10:56:54 rhev2 corosync[6747]:   [QUORUM] Members[1]: 2
Mar 22 10:56:54 rhev2 corosync[6747]:   [QUORUM] Members[1]: 2
Mar 22 10:56:54 rhev2 corosync[6747]:   [CPG   ] downlist received left_list: 0
Mar 22 10:56:54 rhev2 corosync[6747]:   [CPG   ] chosen downlist from
node r(0) ip(192.168.16.32)
Mar 22 10:56:54 rhev2 corosync[6747]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Mar 22 10:56:57 rhev2 fenced[6803]: fenced 3.0.12 started
Mar 22 10:56:58 rhev2 dlm_controld[6823]: dlm_controld 3.0.12 started
Mar 22 10:56:58 rhev2 gfs_controld[6848]: gfs_controld 3.0.12 started
Mar 22 10:57:44 rhev2 kernel: dlm: Using TCP for communications
Mar 22 10:57:49 rhev2 fenced[6803]: fencing node intrarhev1
Mar 22 10:57:53 rhev2 fenced[6803]: fence intrarhev1 success
Mar 22 10:58:00 rhev2 ricci: startup succeeded
Mar 22 10:58:01 rhev2 rgmanager[7460]: I am node #2
Mar 22 10:58:01 rhev2 rgmanager[7460]: Resource Group Manager Starting
Mar 22 10:58:01 rhev2 rgmanager[7460]: Loading Service Data
Mar 22 10:58:03 rhev2 rgmanager[7460]: Initializing Services
Mar 22 10:58:04 rhev2 rgmanager[7460]: Services Initialized
Mar 22 10:58:04 rhev2 rgmanager[7460]: State change: Local UP

Thanks in advance,
Gianluca




More information about the Linux-cluster mailing list