[Linux-cluster] two node cluster with IP tiebreaker failed.
Mockey Chen
mockey.chen at nsn.com
Wed Feb 25 07:21:55 UTC 2009
ext Mockey Chen wrote:
> ext Kein He wrote:
>
>> Hi Mockey,
>>
>> Could you please attach the output from " cman_tool status " and "
>> cman_tool nodes -f" ?
>>
>>
> Thanks your response.
>
> I try to run cman_tool status on as-2, but it hang, without output, and
> even Ctrl+C also no effect.
>
I manually reboot as-1, and the problem solved.
There is the output of cman_tool
[root at as-1 ~]# cman_tool status
Version: 6.1.0
Config Version: 19
Cluster Name: azerothcluster
Cluster Id: 20148
Cluster Member: Yes
Cluster Generation: 76
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Quorum: 2
Active subsystems: 8
Flags: Dirty
Ports Bound: 0 177
Node name: as-1.localdomain
Node ID: 1
Multicast addresses: 239.192.78.3
Node addresses: 10.56.150.3
[root at as-1 ~]# cman_tool status -f
Version: 6.1.0
Config Version: 19
Cluster Name: azerothcluster
Cluster Id: 20148
Cluster Member: Yes
Cluster Generation: 76
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Quorum: 2
Active subsystems: 8
Flags: Dirty
Ports Bound: 0 177
Node name: as-1.localdomain
Node ID: 1
Multicast addresses: 239.192.78.3
Node addresses: 10.56.150.3
It seems cluster can not fence one of the node. How to solve it ?
> I open a new window and can using ssh to as-2, but after login, I can
> not do anything, even a
> simple 'ls' command is hung.
>
> It seem the system keep alive but do not provide any service. Really bad.
>
> Any way to debug this issue ?
>
>> Mockey Chen wrote:
>>
>>> Hi,
>>>
>>> I have a two-nodes cluster, to avoid split-brain. I use ilo as fence
>>> device, IP tiebreaker. here is my /etc/cluster/cluster.conf
>>> <?xml version="1.0"?>
>>> <cluster alias="azerothcluster" config_version="19"
>>> name="azerothcluster">
>>> <cman expected_votes="3" two_node="0"/>
>>> <clusternodes>
>>> <clusternode name="as-1.localdomain" nodeid="1" votes="1">
>>> <fence>
>>> <method name="1">
>>> <device name="ilo1"/>
>>> </method>
>>> </fence>
>>> </clusternode>
>>> <clusternode name="as-2.localdomain" nodeid="2" votes="1">
>>> <fence>
>>> <method name="1">
>>> <device name="ilo2"/>
>>> </method>
>>> </fence>
>>> </clusternode>
>>> </clusternodes>
>>> <quorumd interval="1" tko="10" votes="1" label="pingtest">
>>> <heuristic program="ping 10.56.150.1 -c1 -t1" score="1"
>>> interval="2" tko="3"/>
>>> </quorumd>
>>> <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>>> <fencedevices>
>>> <fencedevice agent="fence_ilo" hostname="10.56.154.18"
>>> login="power" name="ilo1" passwd="pass"/>
>>> <fencedevice agent="fence_ilo" hostname="10.56.154.19"
>>> login="power" name="ilo2" passwd="pass"/>
>>> </fencedevices>
>>> ...
>>> ...
>>>
>>> To test one node lost heartbeat case, I disable ethereal card (eth0) on
>>> as-1, I expect as-2 takeover services on as-1 and as-1 node reboot.
>>> The actual is as-1 lost connection to as-2. as-2 detected it and try to
>>> re-construct cluster, but failed, here is the syslog form as-2
>>>
>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] The token was lost in the
>>> OPERATIONAL state.
>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] Receive multicast socket
>>> recv buffer size (288000 bytes).
>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] Transmit multicast socket
>>> send buffer size (262142 bytes).
>>> Feb 24 21:25:35 as-2 openais[4139]: [TOTEM] entering GATHER state
>>> from 2.
>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering GATHER state
>>> from 0.
>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Creating commit token
>>> because I am the rep.
>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Saving state aru 1f4 high
>>> seq received 1f4
>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Storing new sequence id for
>>> ring 2c
>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering COMMIT state.
>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering RECOVERY state.
>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] position [0] member
>>> 10.56.150.4:
>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] previous ring seq 40 rep
>>> 10.56.150.3
>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] aru 1f4 high delivered 1f4
>>> received flag 1
>>>
>>> Message from syslogd@ at Tue Feb 24 21:25:40 2009 ...
>>> as-2 clurgmgrd[4194]: <emerg> #1: Quorum Dissolved Feb 24 21:25:40 as-2
>>> openais[4139]: [TOTEM] Did not need to originate any messages in
>>> recovery.
>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] Sending initial ORF token
>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] CLM CONFIGURATION CHANGE
>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] New Configuration:
>>> Feb 24 21:25:40 as-2 clurgmgrd[4194]: <emerg> #1: Quorum Dissolved
>>> Feb 24 21:25:40 as-2 kernel: dlm: closing connection to node 1
>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) ip(10.56.150.4)
>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Left:
>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) ip(10.56.150.3)
>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Joined:
>>> Feb 24 21:25:40 as-2 openais[4139]: [CMAN ] quorum lost, blocking
>>> activity
>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] CLM CONFIGURATION CHANGE
>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] New Configuration:
>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] r(0) ip(10.56.150.4)
>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Left:
>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] Members Joined:
>>> Feb 24 21:25:40 as-2 openais[4139]: [SYNC ] This node is within the
>>> primary component and will provide service.
>>> Feb 24 21:25:40 as-2 ccsd[4130]: Cluster is not quorate. Refusing
>>> connection.
>>> Feb 24 21:25:40 as-2 openais[4139]: [TOTEM] entering OPERATIONAL state.
>>> Feb 24 21:25:40 as-2 ccsd[4130]: Error while processing connect:
>>> Connection refused
>>> Feb 24 21:25:40 as-2 openais[4139]: [CLM ] got nodejoin message
>>> 10.56.150.4
>>> Feb 24 21:25:40 as-2 ccsd[4130]: Invalid descriptor specified (-111).
>>> Feb 24 21:25:40 as-2 openais[4139]: [CPG ] got joinlist message from
>>> node 2
>>> Feb 24 21:25:40 as-2 ccsd[4130]: Someone may be attempting something
>>> evil.
>>> Feb 24 21:25:40 as-2 ccsd[4130]: Error while processing get: Invalid
>>> request descriptor
>>> Feb 24 21:25:40 as-2 ccsd[4130]: Invalid descriptor specified (-111).
>>> Feb 24 21:25:41 as-2 ccsd[4130]: Someone may be attempting something
>>> evil.
>>> Feb 24 21:25:41 as-2 ccsd[4130]: Error while processing get: Invalid
>>> request descriptor
>>> Feb 24 21:25:41 as-2 ccsd[4130]: Invalid descriptor specified (-21).
>>> Feb 24 21:25:41 as-2 ccsd[4130]: Someone may be attempting something
>>> evil.
>>> Feb 24 21:25:41 as-2 ccsd[4130]: Error while processing disconnect:
>>> Invalid request descriptor
>>> Feb 24 21:25:41 as-2 avahi-daemon[3756]: Withdrawing address record for
>>> 10.56.150.144 on eth0.
>>> Feb 24 21:25:41 as-2 in.rdiscd[8641]: setsockopt (IP_ADD_MEMBERSHIP):
>>> Address already in use
>>> Feb 24 21:25:41 as-2 in.rdiscd[8641]: Failed joining addresse
>>>
>>>
>>>
>>>
>>> I also found there are some errors in as-1's syslog
>>> Feb 25 11:27:09 as-1 clurgmgrd[4332]: <err> #52: Failed changing RG
>>> status
>>> Feb 25 11:27:09 as-1 clurgmgrd: [4332]: <warning> Link for eth0: Not
>>> detected
>>> Feb 25 11:27:09 as-1 clurgmgrd: [4332]: <warning> No link on eth0...
>>> ...
>>> Feb 25 11:27:36 as-1 ccsd[4268]: Unable to connect to cluster
>>> infrastructure after 30 seconds.
>>> ...
>>> Feb 25 11:28:06 as-1 ccsd[4268]: Unable to connect to cluster
>>> infrastructure after 60 seconds.
>>> ...
>>> Feb 25 11:28:06 as-1 ccsd[4268]: Unable to connect to cluster
>>> infrastructure after 90 seconds.
>>>
>>>
>>> any comment is appreciated!
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
More information about the Linux-cluster
mailing list