[Linux-cluster] newbie questions
Jason
jason at monsterjam.org
Sat Jul 1 12:42:24 UTC 2006
> First thing to test is that you can configure the IP address manually,
> mount the filesystem, and start apache "the old-fashioned way", using
> the /etc/init.d/httpd script on either machine.
[root at tf1 log]# /etc/init.d/httpd start
Starting httpd: (99)Cannot assign requested address: make_sock: could not bind to address
192.168.1.7:80
no listening sockets available, shutting down
>
> If that works, then I'd guess your problem with the cluster service is
> that the <ip > resource needs to be listed before the <script >
> resource, inside the <service/> block, since apache will bomb if the IP
> address you told it to bind to isn't present (and I assume apache is
> configured to bind to that address). If that's the case, then you
> should see an error concerning it in the apache error.log.
>
> As far as nothing being logged about the cluster service trying to
> start, it SHOULD be logging in /var/log/messages, but I've seen some
> wierdness with this in the past. A healthy cluster node should show
> something like this when the service starts:
>
> Jun 22 09:36:51 knob clurgmgrd[3652]: <notice> Starting stopped service
> maps_ip
> Jun 22 09:36:51 knob clurgmgrd: [3652]: <info> Adding IPv4 address
> x.y.8.60 to eth0
> Jun 22 09:36:52 knob clurgmgrd[3652]: <notice> Service maps_ip started
> Jun 22 09:36:52 knob clurgmgrd[3652]: <notice> Starting stopped service
> httpd
> Jun 22 09:36:52 knob clurgmgrd: [3652]: <info> Executing
> /etc/init.d/httpd start
> Jun 22 09:36:54 knob httpd: httpd startup succeeded
> Jun 22 09:36:54 knob clurgmgrd[3652]: <notice> Service httpd started
well, I see messages, but never ones with clurgmgrd
Jul 1 08:27:10 tf1 network: Setting network parameters: succeeded
Jul 1 08:27:10 tf1 network: Bringing up loopback interface: succeeded
Jul 1 08:27:14 tf1 network: Bringing up interface eth0: succeeded
Jul 1 08:27:19 tf1 network: Bringing up interface eth2: succeeded
Jul 1 08:27:19 tf1 procfgd: Starting procfgd: succeeded
Jul 1 08:27:24 tf1 kernel: CMAN: Waiting to join or form a Linux-cluster
Jul 1 08:27:24 tf1 ccsd[3928]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.5
Jul 1 08:27:24 tf1 ccsd[3928]: Initial status:: Inquorate
Jul 1 08:27:56 tf1 kernel: CMAN: forming a new cluster
Jul 1 08:27:56 tf1 kernel: CMAN: quorum regained, resuming activity
Jul 1 08:27:56 tf1 ccsd[3928]: Cluster is quorate. Allowing connections.
Jul 1 08:27:56 tf1 kernel: DLM 2.6.9-41.7 (built May 22 2006 17:34:37) installed
Jul 1 08:27:56 tf1 cman: startup succeeded
Jul 1 08:27:56 tf1 lock_gulmd: no <gulm> section detected in /etc/cluster/cluster.conf
succeeded
Jul 1 08:27:57 tf1 fenced: startup succeeded
Jul 1 08:27:59 tf1 clvmd: Cluster LVM daemon started - connected to CMAN
Jul 1 08:27:59 tf1 clvmd: clvmd startup succeeded
Jul 1 08:27:59 tf1 kernel: cdrom: open failed.
Jul 1 08:28:00 tf1 kernel: cdrom: open failed.
Jul 1 08:28:00 tf1 vgchange: 1 logical volume(s) in volume group "diskarray" now active
Jul 1 08:28:00 tf1 clvmd: Activating VGs: succeeded
Jul 1 08:28:00 tf1 netfs: Mounting other filesystems: succeeded
Jul 1 08:28:00 tf1 kernel: Lock_Harness 2.6.9-49.1 (built May 22 2006 17:38:48) installed
Jul 1 08:28:00 tf1 kernel: GFS 2.6.9-49.1 (built May 22 2006 17:39:06) installed
Jul 1 08:28:00 tf1 kernel: GFS: Trying to join cluster "lock_dlm", "progressive:lv1"
Jul 1 08:28:00 tf1 kernel: Lock_DLM (built May 22 2006 17:38:50) installed
Jul 1 08:28:02 tf1 kernel: GFS: fsid=progressive:lv1.0: Joined cluster. Now mounting FS...
Jul 1 08:28:02 tf1 kernel: GFS: fsid=progressive:lv1.0: jid=0: Trying to acquire journal
lock...
Jul 1 08:28:02 tf1 kernel: GFS: fsid=progressive:lv1.0: jid=0: Looking at journal...
Jul 1 08:28:03 tf1 kernel: GFS: fsid=progressive:lv1.0: jid=0: Done
I compiled/installed all this from source.. Im guessing I missed the
clurgmgrd part.. Ill go back and look.
> (I always find the concept of "starting" an IP address faintly
> hilarious), and then you should see something like:
>
> Jun 22 09:37:33 knob clurgmgrd: [3652]: <info> Executing
> /etc/init.d/httpd status
>
> every 30 seconds or so.
yeah, I never see this.
>
> That brings me to an important point - the apache init script doesn't
> follow whatever standard RedHat init script are supposed to follow
> (there's a thread about this that I was involved in 6-9 months back),
> with respect to the status command. At least, it didn't at the time,
> maybe they've fixed it (I hope, by now). The stop action return(s/ed)
> non-zero (failure) if apache wasn't running. If the cluster manager
> thinks that service was failed, it will first try to stop it before
> starting it. If the apache script returns failure on the attempt to
> stop it because it was stopped already, then the cluster manager will
> think something's wrong and never try to start it. The upshot of which
> is, you have to hack the init script to make it return 0 in this
> situation. I took the copout approach of just forcing it to always
> return 0:
>
> stop() {
> echo -n $"Stopping $prog: "
> killproc $httpd
> - RETVAL=$?
> + RETVAL=0 # makes cluster admin less crazy
> echo
> [ $RETVAL = 0 ] && rm -f ${lockfile} ${pidfile}
> }
>
> which should be safe enough (if killproc fails to kill it you've
> probably got bigger problems on your hands), but could be better.
> Someone else may have pasted a better patch on this list, check the
> archives.
>
> I just checked a fresh install of httpd on an AS 4 latest box, and the
> script is still the same. Convenient, since httpd is the specific
> example service used for setting up a cluster service in the Cluster
> Suite docs. ;-)
>
> I hope this helps - I'll stop rambling now.
>
> Oh, one other thing - if the filesystem is GFS, why bother
> mounting/unmounting at all? Just have it mounted in fstab, or make it a
> separate cluster service if you want the extra assurance that it'll stay
> mounted.
ooh I do have it in the fstab... thats just me not fully understanding how all this is supposed
to work.
Jason
> >
> >
> ><?xml version="1.0"?>
> ><cluster config_version="22" name="progressive">
> > <fence_daemon clean_start="0" post_fail_delay="0"
> > post_join_delay="3"/>
> > <clusternodes>
> > <clusternode name="tf1" votes="1">
> > <fence>
> > <method name="1">
> > <device name="apc_power_switch"
> > option="off" port="1" switch="1"/>
> > <device name="apc_power_switch"
> > option="off" port="2" switch="1"/>
> > <device name="apc_power_switch"
> > option="on" port="1" switch="1"/>
> > <device name="apc_power_switch"
> > option="on" port="2" switch="1"/>
> > </method>
> > </fence>
> > </clusternode>
> > <clusternode name="tf2" votes="1">
> > <fence>
> > <method name="1">
> > <device name="apc_power_switch"
> > option="off" port="3" switch="1"/>
> > <device name="apc_power_switch"
> > option="off" port="4" switch="1"/>
> > <device name="apc_power_switch"
> > option="on" port="3" switch="1"/>
> > <device name="apc_power_switch"
> > option="on" port="4" switch="1"/>
> > </method>
> > </fence>
> > </clusternode>
> > </clusternodes>
> > <cman expected_votes="1" two_node="1"/>
> > <fencedevices>
> > <fencedevice agent="fence_apc" ipaddr="192.168.1.8"
> > login="apc" name="apc_power_switch" passwd="apc"/>
> > </fencedevices>
> > <rm>
> > <failoverdomains>
> > <failoverdomain name="httpd" ordered="1"
> > restricted="1">
> > <failoverdomainnode name="tf1"
> > priority="1"/>
> > <failoverdomainnode name="tf2"
> > priority="2"/>
> > </failoverdomain>
> > </failoverdomains>
> > <resources>
> > <script file="/etc/init.d/httpd"
> > name="cluster_apache"/>
> > <fs device="/dev/mapper/diskarray-lv1"
> > fstype="ext3" mountpoint="/mnt/gfs/htdocs" name="apache_content"/>
> > <ip address="192.168.1.7" monitor_link="1"/>
> > </resources>
> > <service autostart="1" domain="httpd" name="Apache
> > Service">
> > <script ref="cluster_apache"/>
> > <fs ref="apache_content"/>
> > <ip ref="192.168.1.7"/>
> > </service>
> > </rm>
> ></cluster>
> >
> >
> >ooh the other thing is that I had to lie about the filesystem in which it
> >lives, it only gave me the ext2/ext3 options, (i chose ext3) but its on a
> >gfs partition.
> >
> >Jason
> >
> >--
> >Linux-cluster mailing list
> >Linux-cluster at redhat.com
> >https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
--
================================================
| Jason Welsh jason at monsterjam.org |
| http://monsterjam.org DSS PGP: 0x5E30CC98 |
| gpg key: http://monsterjam.org/gpg/ |
================================================
More information about the Linux-cluster
mailing list