[Linux-cluster] virtual address went down? (with panic link)

Fri Oct 20 00:40:30 UTC 2006

I think the mailing list doesnt like attachments, so heres a link to the panic that was supposed 
to go along with this post.
http://monsterjam.org/crash/panic.jpg

I tried stopping the services on 
the first box of my 2 node cluster:
service rgmanager stop
service gfs stop
service clvmd stop
service fenced stop
service cman stop
service ccsd stop

everything came down fine.
then I started em back up..
service ccsd start
this seemed to hang for about 2 minutes, then I got a panic..
as shown in the linked above  graphic..

this is on  2.6.9-34.ELsmp redhat  Enterprise Linux AS release 4 (Nahant 
Update 4)
running ccs-1.0.3-0,
cman-kernel-hugemem-2.6.9-43.8
cman-kernel-2.6.9-43.8
cman-1.0.4-0
cman-kernel-smp-2.6.9-43.8
cman-kernheaders-2.6.9-43.8

 built from sources..

heres my cluster.conf

<?xml version="1.0"?>
<cluster config_version="22" name="progressive">
        <fence_daemon clean_start="0" post_fail_delay="0" 
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="tf1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="apc_power_switch" 
option="                                                        off" 
port="1" switch="1"/>
                                        <device name="apc_power_switch" 
option="                                                        off" 
port="2" switch="1"/>
                                        <device name="apc_power_switch" 
option="                                                        on" 
port="1" switch="1"/>
                                        <device name="apc_power_switch" 
option="                                                        on" 
port="2" switch="1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="tf2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="apc_power_switch" 
option="                                                        off" 
port="3" switch="1"/>
                                        <device name="apc_power_switch" 
option="                                                        off" 
port="4" switch="1"/>
                                        <device name="apc_power_switch" 
option="                                                        on" 
port="3" switch="1"/>
                                        <device name="apc_power_switch" 
option="                                                        on" 
port="4" switch="1"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="192.168.1.8" 
login="xxx"                                                         
name="apc_power_switch" passwd="xxx"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="httpd" ordered="1" 
restricted="1">
                                <failoverdomainnode name="tf1" 
priority="1"/>
                                <failoverdomainnode name="tf2" 
priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <script file="/etc/init.d/httpd" 
name="cluster_apache"/>
                        <fs device="/dev/mapper/diskarray-lv1" 
fstype="ext3" mou                                                        
ntpoint="/mnt/gfs/htdocs" name="apache_content"/>
                        <ip address="192.168.1.7" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="httpd" name="Apache 
Service">
                        <ip ref="192.168.1.7"/>
                        <script ref="cluster_apache"/>
                        <fs ref="apache_content"/>
                </service>
        </rm>
</cluster>

ooh and shortly after the first box came back up, the second one got 
rebooted automagically (power fenced from the first one im guessing) for 
good measure.

any help appreciated 

Jason

On Tue, Oct 17, 2006 at 09:37:15PM -0400, jason at monsterjam.org wrote:
> so Ive had a test cluster running for quite a while now, both nodes of a 2 node cluster are up, 
> but the virtual address seems to have disappeared.. its not pingable, neither server has it 
> configured anymore.. The only application I had using the virtual address was apache (just for 
> testing it). what logs/information should I be looking at to see what happened and why?
> 
> regards,
> Jason
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster