[Linux-cluster] Cluster node hangs

sachin sachinbhugra at hotmail.com
Wed Feb 23 20:27:08 UTC 2011


Hi Dominic,

 

Below is my cluster.conf:

===================================

<?xml version="1.0"?>

<cluster alias="rhel5_cluster" config_version="21" name="rhel5_cluster">

        <fence_daemon post_fail_delay="0" post_join_delay="3"/>

        <clusternodes>

                <clusternode name="rhel5cln1.home.com" nodeid="1" votes="1">

                        <fence>

                                <method name="1">

                                        <device name="manual_fence"
nodename="rhel5cln1.home.com"/>

                                </method>

                        </fence>

                </clusternode>

                <clusternode name="rhel5cln2.home.com" nodeid="2" votes="1">

                        <fence>

                                <method name="1">

                                        <device name="manual_fence"
nodename="rhel5cln2.home.com"/>

                                </method>

                        </fence>

                </clusternode>

        </clusternodes>

        <cman expected_votes="1" two_node="1"/>

        <fencedevices>

                <fencedevice agent="fence_manual" name="manual_fence"/>

        </fencedevices>

        <rm log_level="7" log_facility="local3">

                <failoverdomains/>

                <resources>

                        <script file="/usr/local/httpd2.2.16/bin/apachectl"
name="Apache_Script"/>

                        <ip address="192.168.30.137" monitor_link="1"/>

                        <clusterfs device="/dev/sdc" force_unmount="0"
fsid="22440" fstype="gfs2" mountpoint="/usr/local/httpd2.2.16/htdocs/"
name="gfs2share" options=""/>

                </resources>

                <service autostart="1" name="Apache_Service"
recovery="restart">

                        <ip ref="192.168.30.137"/>

                        <script ref="Apache_Script"/>

                </service>

                <service autostart="1" name="gfs2share" recovery="relocate">

                        <clusterfs ref="gfs2share"/>

                </service>

        </rm>

<logging to_syslog="yes" to_logfile="yes" syslog_facility="local3">

<logging_daemon name="corosync" logfile="/var/log/cluster.log"/>

</logging>

</cluster>

=================================

 

One thing which I noticed is when I move the service on other node, it
generates the following logs:

 

Feb 20 21:50:48 rhel5cln1 clurgmgrd[13764]: <notice> Stopping service
service:gfs2share

Feb 20 21:50:48 rhel5cln1 clurgmgrd: [13764]: <debug> Not umounting /dev/sdc
(clustered file system)

Feb 20 21:50:48 rhel5cln1 clurgmgrd[13764]: <notice> Service
service:gfs2share is stopped

 

Cluster is configured in such that only one node should be mounting the GFS2
FS. When I start the cluster only one node mounts GFS2, however when service
is moved GFS2 gets mounted on both the node but it is still accessible. It
hangs when the owner node goes down and services move to other node
automatically.

 

 

From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of dOminic
Sent: Sunday, February 13, 2011 8:03 PM
To: linux clustering
Subject: Re: [Linux-cluster] Cluster node hangs

 

Hi,

 

Whats the msg you are getting in logs ?. It would be great if you could
attach log mesgs along with cluster.conf 

 

-dominic 

 

On Sun, Feb 13, 2011 at 3:49 PM, Sachin Bhugra <sachinbhugra at hotmail.com>
wrote:

Thank for the reply and link. However, GFS2 is not listed in fstab, it is
only handled by cluster config.

  _____  

Date: Sun, 13 Feb 2011 10:52:51 +0100
From: ekuric at redhat.com
To: linux-cluster at redhat.com
Subject: Re: [Linux-cluster] Cluster node hangs



On 02/13/2011 10:41 AM, Elvir Kuric wrote: 

On 02/13/2011 10:14 AM, Sachin Bhugra wrote: 

Hi ,

I have setup a two node cluster in lab, with Vmware Server, and hence used
manual fencing. It includes a iSCSI GFS2 partition and it service Apache in
Active/Passive mode.

Cluster works and I am able to relocate service between nodes with no
issues. However, the problem comes when I shutdown the node, for testing,
which is presently holding the service. When the node becomes unavailable,
service gets relocated and GFS partition gets mounted on the other node,
however it is not accessible. If I try to do a "ls/du" on GFS partition, the
command hangs. On the other hand the node which was shutdown gets stuck at
"unmounting file system". 

I tried using fence_manual -n nodename and then fence_ack_manual -n
nodename, however it still remains the same.

Can someone please help me is what I am doing wrong?

Thanks, 




--


Linux-cluster mailing list


Linux-cluster at redhat.com


https://www.redhat.com/mailman/listinfo/linux-cluster

It would be good to see  /etc/fstab configuration used on cluster nodes. If
/gfs partition is mounted manually it will not be unmounted correctly in
case you restart node ( and not executing umount prior restart ), and will
hang during shutdown/reboot process.

More at:
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Glo
bal_File_System_2/index.html


Edit: above link, section 3.4 Special Considerations when Mounting GFS2 File
Systems 



Regards, 

Elvir 

 

 




--


Linux-cluster mailing list


Linux-cluster at redhat.com


https://www.redhat.com/mailman/listinfo/linux-cluster

 

-- Linux-cluster mailing list Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster 


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110224/3dd27c34/attachment.htm>


More information about the Linux-cluster mailing list