[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] newbie questions

I see you figured out your multiple ports fencing issue. Good, that saves me a rant about system-config-cluster ... ;-)

First thing to test is that you can configure the IP address manually, mount the filesystem, and start apache "the old-fashioned way", using the /etc/init.d/httpd script on either machine.

If that works, then I'd guess your problem with the cluster service is that the <ip > resource needs to be listed before the <script > resource, inside the <service/> block, since apache will bomb if the IP address you told it to bind to isn't present (and I assume apache is configured to bind to that address). If that's the case, then you should see an error concerning it in the apache error.log.

As far as nothing being logged about the cluster service trying to start, it SHOULD be logging in /var/log/messages, but I've seen some wierdness with this in the past. A healthy cluster node should show something like this when the service starts:

Jun 22 09:36:51 knob clurgmgrd[3652]: <notice> Starting stopped service maps_ip Jun 22 09:36:51 knob clurgmgrd: [3652]: <info> Adding IPv4 address x.y.8.60 to eth0
Jun 22 09:36:52 knob clurgmgrd[3652]: <notice> Service maps_ip started
Jun 22 09:36:52 knob clurgmgrd[3652]: <notice> Starting stopped service httpd Jun 22 09:36:52 knob clurgmgrd: [3652]: <info> Executing /etc/init.d/httpd start
Jun 22 09:36:54 knob httpd: httpd startup succeeded
Jun 22 09:36:54 knob clurgmgrd[3652]: <notice> Service httpd started

(I always find the concept of "starting" an IP address faintly hilarious), and then you should see something like:

Jun 22 09:37:33 knob clurgmgrd: [3652]: <info> Executing /etc/init.d/httpd status

every 30 seconds or so.

That brings me to an important point - the apache init script doesn't follow whatever standard RedHat init script are supposed to follow (there's a thread about this that I was involved in 6-9 months back), with respect to the status command. At least, it didn't at the time, maybe they've fixed it (I hope, by now). The stop action return(s/ed) non-zero (failure) if apache wasn't running. If the cluster manager thinks that service was failed, it will first try to stop it before starting it. If the apache script returns failure on the attempt to stop it because it was stopped already, then the cluster manager will think something's wrong and never try to start it. The upshot of which is, you have to hack the init script to make it return 0 in this situation. I took the copout approach of just forcing it to always return 0:

 stop() {
         echo -n $"Stopping $prog: "
         killproc $httpd
-        RETVAL=$?
+        RETVAL=0 # makes cluster admin less crazy
         [ $RETVAL = 0 ] && rm -f ${lockfile} ${pidfile}

which should be safe enough (if killproc fails to kill it you've probably got bigger problems on your hands), but could be better. Someone else may have pasted a better patch on this list, check the archives.

I just checked a fresh install of httpd on an AS 4 latest box, and the script is still the same. Convenient, since httpd is the specific example service used for setting up a cluster service in the Cluster Suite docs. ;-)

I hope this helps - I'll stop rambling now.

Oh, one other thing - if the filesystem is GFS, why bother mounting/unmounting at all? Just have it mounted in fstab, or make it a separate cluster service if you want the extra assurance that it'll stay mounted.


Jason wrote:
ok, one last question, I hope... im following the directions at
to set up apache as a test... and I cannot see that apache gets started on either of my cluster nodes (only 2) the ip address ive configured it as is an unused ip address in the subnet that both boxes are on. how/where can I troubleshoot this? I dont see anything in the logs about the service trying to start. here is my cluster.config

<?xml version="1.0"?>
<cluster config_version="22" name="progressive">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
                <clusternode name="tf1" votes="1">
                                <method name="1">
<device name="apc_power_switch" option="off" port="1" switch="1"/> <device name="apc_power_switch" option="off" port="2" switch="1"/> <device name="apc_power_switch" option="on" port="1" switch="1"/> <device name="apc_power_switch" option="on" port="2" switch="1"/>
                <clusternode name="tf2" votes="1">
                                <method name="1">
<device name="apc_power_switch" option="off" port="3" switch="1"/> <device name="apc_power_switch" option="off" port="4" switch="1"/> <device name="apc_power_switch" option="on" port="3" switch="1"/> <device name="apc_power_switch" option="on" port="4" switch="1"/>
        <cman expected_votes="1" two_node="1"/>
<fencedevice agent="fence_apc" ipaddr="" login="apc" name="apc_power_switch" passwd="apc"/>
                        <failoverdomain name="httpd" ordered="1" restricted="1">
                                <failoverdomainnode name="tf1" priority="1"/>
                                <failoverdomainnode name="tf2" priority="2"/>
                        <script file="/etc/init.d/httpd" name="cluster_apache"/>
<fs device="/dev/mapper/diskarray-lv1" fstype="ext3" mountpoint="/mnt/gfs/htdocs" name="apache_content"/>
                        <ip address="" monitor_link="1"/>
                <service autostart="1" domain="httpd" name="Apache Service">
                        <script ref="cluster_apache"/>
                        <fs ref="apache_content"/>
                        <ip ref=""/>

ooh the other thing is that I had to lie about the filesystem in which it lives, it only gave me the ext2/ext3 options, (i chose ext3) but its on a gfs partition.


Linux-cluster mailing list
Linux-cluster redhat com

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]