[Linux-cluster] Two-node cluster GFS2 confusing

Thu Jun 19 02:01:35 UTC 2014

I don't use VMware myself, but I think fence_vmware will work for you. 
Please note that simply enabling stonith is not enough. As you realize, 
you need a configured and working fence method.

If you try using the command line, you can play with the command's 
switched asking for 'status'. When that returns properly, you will then 
just need to convert the switches into arguments for pacemaker.

Read the man page for 'fence_vmware', and then try calling:

fence_vmware ... -o status

Fill in the switches and values you need based on the instructions in 
'man fence_vmware'.

digimer

On 18/06/14 09:51 PM, Le Trung Kien wrote:
> Hi,
>
> As Digimer suggested, I change property
>
> stonith-enabled=true
>
> But now I don't know which fencing method I should use, because my two Redhat nodes running on VMWare Workstation, OpenFiler as SCSI shared LUN storage.
>
> I attempted to use "fence_scsi", but no luck, I got this error:
>
> Jun 19 08:35:58 server1 stonith_admin[3837]:   notice: crm_log_args: Invoked: stonith_admin --reboot server2 --tolerance 5s
> Jun 19 08:36:08 server1 root: fence_pcmk[3836]: Call to fence server2 (reset) failed with rc=255
>
> Here is my fencing configuration:
>
> <?xml version="1.0"?>
> <cluster config_version="1" name="mycluster">
> <cman expected_votes="1" cluster_id="1"/>
> <fence_daemon post_fail_delay="0" post_join_delay="30"/>
> <clusternodes>
>          <clusternode name="server1" votes="1" nodeid="1">
>                  <fence>
>                          <method name="scsi">
>                          <device name="scsi_dev" key="1"/>
>                  </method>
>          </fence>
>          </clusternode>
>          <clusternode name="server2" votes="1" nodeid="2">
>                  <fence>
>                          <method name="scsi">
>                          <device name="scsi_dev" key="2"/>
>                          </method>
>                  </fence>
>          </clusternode>
>          </clusternodes>
> <fencedevices>
>          <fencedevice agent="fence_scsi" name="scsi_dev" aptpl="1" logfile="/tmp/fence_scsi.log"/>
> </fencedevices>
> </cluster>
>
> And the log: /tmp/fence_scsi.log show:
>
> Jun 18 19:49:40 fence_scsi: [error] no devices found
>
> I will try "vmware_soap" to see if it works.
>
> Kien Le
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
> Sent: Wednesday, June 18, 2014 11:18 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] Two-node cluster GFS2 confusing
>
> On 16/06/14 07:43 AM, Le Trung Kien wrote:
>> Hello everyone,
>>
>> I'm a new man on linux cluster.  I have built a two-node cluster (without qdisk), includes:
>>
>> Redhat 6.4
>> cman
>> pacemaker
>> gfs2
>>
>> My cluster could fail-over (back and forth) between two nodes for
>> these 3 resources: ClusterIP, WebFS (Filesystem GFS2 mount /dev/sdc on
>> /mnt/gfs2_storage), WebSite ( apache service)
>>
>> My problem occurs when I stop/start node in the following order: (when
>> both nodes started)
>>
>> 1. Stop: node1 (shutdown) -> all resource fail-over on node2 -> all
>> resources still working on node2 2. Stop: node2 (stop service:
>> pacemaker then cman) -> all resources stop (of course) 3. Start: node1
>> (start service: cman then pacemaker) -> only ClusterIP started, WebFS
>> failed, WebSite not started
>>
>> Status:
>>
>> Last updated: Mon Jun 16 18:34:56 2014 Last change: Mon Jun 16
>> 14:24:54 2014 via cibadmin on server1
>> Stack: cman
>> Current DC: server1 - partition WITHOUT quorum
>> Version: 1.1.8-7.el6-394e906
>> 2 Nodes configured, 1 expected votes
>> 4 Resources configured.
>>
>> Online: [ server1 ]
>> OFFLINE: [ server2 ]
>>
>>    ClusterIP      (ocf::heartbeat:IPaddr2):       Started server1
>>    WebFS  (ocf::heartbeat:Filesystem):    Started server1 (unmanaged) FAILED
>>
>> Failed actions:
>>       WebFS_stop_0 (node=server1, call=32, rc=1, status=Timed Out):
>> unknown error
>>
>> Here is my /etc/cluster/cluster.conf
>> <?xml version="1.0"?>
>> <cluster config_version="1" name="mycluster">
>>           <logging debug="on"/>
>>           <clusternodes>
>>                   <clusternode name="server1" nodeid="1">
>>                           <fence>
>>                                   <method name="pcmk-redirect">
>>                                           <device name="pcmk" port="server1"/>
>>                                   </method>
>>                           </fence>
>>                   </clusternode>
>>                   <clusternode name="server2" nodeid="2">
>>                           <fence>
>>                                   <method name="pcmk-redirect">
>>                                           <device name="pcmk" port="server2"/>
>>                                   </method>
>>                           </fence>
>>                   </clusternode>
>>           </clusternodes>
>>           <fencedevices>
>>                   <fencedevice name="pcmk" agent="fence_pcmk"/>
>>           </fencedevices>
>> </cluster>
>>
>> Here is my: crm configure show
>>
>
> <snip>
>
>>           stonith-enabled=false \
>
> Well this is a problem.
>
> When cman detects a failure (well corosync, but cman is told), it initiates a fence request. The fence daemon informs DLM with blocks.
> Then fenced calls the configured 'fence_pcmk', which just passes the request up to pacemaker.
>
> Without stonith configured in fencing, pacemaker will fail to fence, of course. Thus, DLM sits blocked, so DRBD (and clustered LVM) hang, by design.
>
> If configure proper fencing in pacemaker (and test it to make sure it works), then pacemaker *would* succeed in fencing and return a success to fence_pcmk. Then fenced is told that the fence succeeds, DLM cleans up lost locks and returns to normal operation.
>
> So please configure and test real stonith in pacemaker and see if your problem is resolved.
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?