[Linux-cluster] migrating to better node

Tue Feb 26 12:48:34 UTC 2008

Hi,
I am currently testing following cluster of VM machines:
Two nodes, Shared storage, APC fencing device.
cluster.conf follows.

I am encountering this situation:
1) VM is running on clu2
2) clu2 gets fenced and VM is migrated to clu1
3) after clu2 is started again, VM is automatically migrated to clu1
according to logs and this fails.

Do anybody know, why this fails? 

Feb 26 13:25:27 clu1 fenced[2973]: fence "clu2.test-cluster.cz" success
Feb 26 13:25:32 clu1 kernel: GFS: fsid=adler:virtdata.1: jid=0: Trying
to acquire journal lock...
...
Feb 26 13:25:32 clu1 kernel: GFS: fsid=adler:virtdata.1: jid=0: Done
...
Feb 26 13:25:32 clu1 clurgmgrd[3885]: <notice> Taking over service
vm:win2003 from down member clu2.test-cluster.cz

...clu2 rejoins...
Feb 26 13:27:32 clu1 clurgmgrd[3885]: <notice> Migrating vm:win2003 to
better node clu2.test-cluster.cz
Feb 26 13:27:35 clu1 kernel: peth0: received packet with  own address as
source address
Feb 26 13:27:37 clu1 kernel: dlm: connecting to 1 
-----> Feb 26 13:27:47 clu1 clurgmgrd[3885]: <err> #75: Failed changing service
status 

Cluste.conf:
<?xml version="1.0"?>
<cluster alias="adler" config_version="13" name="adler">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="clu2.test-cluster.cz" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="apc" port="3"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="clu1.test-cluster.cz" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="apc" port="1"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="192.168.0.54" login="apc" name="apc" passwd="apc"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="clu1" ordered="0" restricted="0">
                                <failoverdomainnode name="clu1.test-cluster.cz" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="clu2" ordered="0" restricted="0">
                                <failoverdomainnode name="clu2.test-cluster.cz" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="clu" ordered="0" restricted="1">
                                <failoverdomainnode name="clu2.test-cluster.cz" priority="1"/>
                                <failoverdomainnode name="clu1.test-cluster.cz" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <fs device="/dev/sdb8" force_fsck="0" force_unmount="1" fsid="62307" fstype="ext3" mountpoint="/mnt/data" name="data" self_fence="0"/>
                        <clusterfs device="/dev/mapper/gfs1-gfsdata" force_unmount="0" fsid="59408" fstype="gfs" mountpoint="/mnt/gfs" name="gfs"/>
                </resources>
                <vm autostart="1" domain="clu" exclusive="0" name="sybase" path="/mnt/gfs/" recovery="restart"/>
                <vm autostart="1" domain="clu2" exclusive="0" name="win2003" path="/mnt/gfs" recovery="relocate"/>
        </rm>
</cluster>

Thanks,
Jakub Suchy
-- 
Jakub Suchý <jakub.suchy at enlogit.cz>
GSM: +420 - 777 817 949

Enlogit s.r.o, U Cukrovaru 509/4, 400 07 Ústí nad Labem
tel.: +420 - 474 745 159, fax: +420 - 474 745 160
e-mail: info at enlogit.cz, web: http://www.enlogit.cz