[Linux-cluster] Re: Fencing test
Paras pradhan
pradhanparas at gmail.com
Thu Jan 8 18:39:10 UTC 2009
On Mon, Jan 5, 2009 at 12:11 PM, Paras pradhan <pradhanparas at gmail.com> wrote:
> hi,
>
> On Mon, Jan 5, 2009 at 8:23 AM, Rajagopal Swaminathan
> <raju.rajsand at gmail.com> wrote:
>> Greetings,
>>
>> On Sat, Jan 3, 2009 at 4:18 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>
>>> Here I am using 4 nodes.
>>>
>>> Node 1) That runs luci
>>> Node 2) This is my iscsi shared storage where my virutal machine(s) resides
>>> Node 3) First node in my two node cluster
>>> Node 4) Second node in my two node cluster
>>>
>>> All of them are connected simply to an unmanaged 16 port switch.
>>
>> Luci need not require a separate node to run. it can run on one of the
>> member nodes (node 3 | 4).
>
> OK.
>
>>
>> what does clustat say?
>
> Here is my clustat o/p:
>
> -----------
>
> [root at ha1lx ~]# clustat
> Cluster Status for ipmicluster @ Mon Jan 5 12:00:10 2009
> Member Status: Quorate
>
> Member Name ID Status
> ------ ---- ---- ------
> 10.42.21.29 1
> Online, rgmanager
> 10.42.21.27 2
> Online, Local, rgmanager
>
> Service Name
> Owner (Last) State
> ------- ----
> ----- ------ -----
> vm:linux64
> 10.42.21.27
> started
> [root at ha1lx ~]#
> ------------------------
>
>
> 10.42.21.27 is node3 and 10.42.21.29 is node4
>
>
>
>>
>> Can you post your cluster.conf here?
>
> Here is my cluster.conf
>
> --
> [root at ha1lx cluster]# more cluster.conf
> <?xml version="1.0"?>
> <cluster alias="ipmicluster" config_version="8" name="ipmicluster">
> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
> <clusternodes>
> <clusternode name="10.42.21.29" nodeid="1" votes="1">
> <fence>
> <method name="1">
> <device name="fence2"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="10.42.21.27" nodeid="2" votes="1">
> <fence>
> <method name="1">
> <device name="fence1"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman expected_votes="1" two_node="1"/>
> <fencedevices>
> <fencedevice agent="fence_ipmilan" ipaddr="10.42.21.28"
> login="admin" name="fence1" passwd="admin"/>
> <fencedevice agent="fence_ipmilan" ipaddr="10.42.21.30"
> login="admin" name="fence2" passwd="admin"/>
> </fencedevices>
> <rm>
> <failoverdomains>
> <failoverdomain name="myfd" nofailback="0" ordered="1" restricted="0">
> <failoverdomainnode name="10.42.21.29" priority="2"/>
> <failoverdomainnode name="10.42.21.27" priority="1"/>
> </failoverdomain>
> </failoverdomains>
> <resources/>
> <vm autostart="1" domain="myfd" exclusive="0" migrate="live"
> name="linux64" path="/guest_roots" recovery="restart"/>
> </rm>
> </cluster>
> ------
>
>
> Here:
>
> 10.42.21.28 is IPMI interface in node3
> 10.42.21.30 is IPMI interface in node4
>
>
>
>
>
>
>
>
>>
>> When you pull out the network cable *and* plug it back in say node 3,
>> , what messages appear in the /var/log/messages if Node 4 (if any)?
>> (sorry for the repitition, but messages are necessary here to make any
>> sense of the situation)
>>
>
> Ok here is the log in node 4 after i disconnect the network cable in node3.
>
> -----------
>
> Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] The token was lost in the
> OPERATIONAL state.
> Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] Receive multicast socket
> recv buffer size (288000 bytes).
> Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] Transmit multicast socket
> send buffer size (262142 bytes).
> Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] entering GATHER state from 2.
> Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering GATHER state from 0.
> Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Creating commit token
> because I am the rep.
> Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Saving state aru 76 high
> seq received 76
> Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Storing new sequence id
> for ring ac
> Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering COMMIT state.
> Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering RECOVERY state.
> Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] position [0] member 10.42.21.29:
> Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] previous ring seq 168 rep
> 10.42.21.27
> Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] aru 76 high delivered 76
> received flag 1
> Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Did not need to originate
> any messages in recovery.
> Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Sending initial ORF token
> Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE
> Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] New Configuration:
> Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29)
> Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Left:
> Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.27)
> Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Joined:
> Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE
> Jan 5 12:05:28 ha2lx kernel: dlm: closing connection to node 2
> Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] New Configuration:
> Jan 5 12:05:28 ha2lx fenced[5004]: 10.42.21.27 not a cluster member
> after 0 sec post_fail_delay
> Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29)
> Jan 5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Trying to acquire journal lock...
> Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Left:
> Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Joined:
> Jan 5 12:05:28 ha2lx openais[4988]: [SYNC ] This node is within the
> primary component and will provide service.
> Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state.
> Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] got nodejoin message 10.42.21.29
> Jan 5 12:05:28 ha2lx openais[4988]: [CPG ] got joinlist message from node 1
> Jan 5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Looking at journal...
> Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Acquiring the transaction lock...
> Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Replaying journal...
> Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Replayed 0 of 0 blocks
> Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Found 0 revoke tags
> Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0:
> jid=1: Journal replayed in 1s
> Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: jid=1: Done
> ------------------
>
> Now when I plug back my cable to node3, node 4 reboots and here is the
> quickly grabbed log in node4
>
>
> --
> Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering GATHER state from 11.
> Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] Saving state aru 1d high
> seq received 1d
> Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] Storing new sequence id
> for ring b0
> Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering COMMIT state.
> Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering RECOVERY state.
> Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] position [0] member 10.42.21.27:
> Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep
> 10.42.21.27
> Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 16 high delivered 16
> received flag 1
> Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] position [1] member 10.42.21.29:
> Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep
> 10.42.21.29
> Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 1d high delivered 1d
> received flag 1
> Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] Did not need to originate
> any messages in recovery.
> Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE
> Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] New Configuration:
> Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29)
> Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Left:
> Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Joined:
> Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE
> Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] New Configuration:
> Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.27)
> Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29)
> Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Left:
> Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Joined:
> Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.27)
> Jan 5 12:07:12 ha2lx openais[4988]: [SYNC ] This node is within the
> primary component and will provide service.
> Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state.
> Jan 5 12:07:12 ha2lx openais[4988]: [MAIN ] Killing node 10.42.21.27
> because it has rejoined the cluster with existing state
> Jan 5 12:07:12 ha2lx openais[4988]: [CMAN ] cman killed by node 2
> because we rejoined the cluster without a full restart
> Jan 5 12:07:12 ha2lx gfs_controld[5016]: groupd_dispatch error -1 errno 11
> Jan 5 12:07:12 ha2lx gfs_controld[5016]: groupd connection died
> Jan 5 12:07:12 ha2lx gfs_controld[5016]: cluster is down, exiting
> Jan 5 12:07:12 ha2lx dlm_controld[5010]: cluster is down, exiting
> Jan 5 12:07:12 ha2lx kernel: dlm: closing connection to node 1
> Jan 5 12:07:12 ha2lx fenced[5004]: cluster is down, exiting
> -------
>
>
> Also here is the log of node3:
>
> --
> [root at ha1lx ~]# tail -f /var/log/messages
> Jan 5 12:07:24 ha1lx openais[26029]: [TOTEM] entering OPERATIONAL state.
> Jan 5 12:07:24 ha1lx openais[26029]: [CLM ] got nodejoin message 10.42.21.27
> Jan 5 12:07:24 ha1lx openais[26029]: [CLM ] got nodejoin message 10.42.21.27
> Jan 5 12:07:24 ha1lx openais[26029]: [CPG ] got joinlist message from node 2
> Jan 5 12:07:27 ha1lx ccsd[26019]: Attempt to close an unopened CCS
> descriptor (4520670).
> Jan 5 12:07:27 ha1lx ccsd[26019]: Error while processing disconnect:
> Invalid request descriptor
> Jan 5 12:07:27 ha1lx fenced[26045]: fence "10.42.21.29" success
> Jan 5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1:
> jid=0: Trying to acquire journal lock...
> Jan 5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1:
> jid=0: Looking at journal...
> Jan 5 12:07:28 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1: jid=0: Done
> ----------------
>
>
>
>
>
>
>
>
>
>
>
>
>> HTH
>>
>> With warm regards
>>
>> Rajagopal
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
> Thanks a lot
>
> Paras.
>
In an act to solve my fencing issue in my 2 node cluster, i tried to
run fence_ipmi to check if fencing is working or not. I need to know
what is my problem
-
[root at ha1lx ~]# fence_ipmilan -a 10.42.21.28 -o off -l admin -p admin
Powering off machine @ IPMI:10.42.21.28...ipmilan: Failed to connect
after 30 seconds
Failed
[root at ha1lx ~]#
---------------
Here 10.42.21.28 is an IP address assigned to IPMI interface and I am
running this command in the same host.
Thanks
Paras.
More information about the Linux-cluster
mailing list