[Linux-cluster] Cluster of XEN guests unstable when rebooting a node under CS5.1
Paolo Marini
paolom at prisma-eng.it
Thu Dec 13 18:43:23 UTC 2007
Good ! It seems the right solution. Below my answers/comments.
Thanks, Paolo
> On Wed, 2007-12-12 at 19:23 +0100, Paolo Marini wrote:
>
>> I reiterate the request for help hoping someone has undergone (and
>> hopefully solved) the same issues.
>>
>> I am building up a cluster of XEN Guests with root file system residing
>> on a file on an GFS filesystem (iscsi actually).
>>
>> Each cluster node mounts an GFS file system residing on an iscsi device.
>>
>> For performance reasons, both the iscsi device and the physical nodes
>> (part also of a cluster) use two gigabit ethernet with bonding and LACP.
>> For the physical machines, I had to insert a sleep 30 on the
>> /etc/init.d/iscsi script before the iscsi login, in order to wait for
>> the bond interface to come up, otherwise the iscsi devices are not seen
>> and no gfs mount is possible.
>>
>> Then, going to the cluster of XEN Guests, they work fine, I am able to
>> migrate each one to a different physical node without problems on the guest.
>>
>> When I reboot or fence one of the guests, the guest cluster breaks, e.g.
>> the quorum is dissolved and I have to fence ALL the nodes and reboot
>> them in order for the cluster to restart.
>>
>
> How many guests - and what are you using for fencing ?
>
>
I am using 5 guests - 4 are within a cluster and the remaining one is a
management node (nagios etc.). I am using fencing with fence_xvm and it
is correctly configured and working. Each Physical node is a DELL PE860
with 4 Gb of RAM, one quad XEON and 3 network interfaces, two are used
for bonding and the third one is reserved for IPMI (which I use for
fencing of the physical nodes).
The guests configure two network interfaces (eth0 and eth0:0), one is
for private communications between the nodes and to the iscsi device,
the other for the public access to the nodes. I am not using VLAN.
>> Does it have to do with the xen bridge going up and down for a time
>> longer than the heartbeat timeout ?
>>
>
> Not sure - it shouldn't be that big of a deal. If you think that's the
> problem try adding:
>
> <totem token="30000"/>
>
>
It seems much more stable. More tests will prove this. By now, xm
destroy on a guest causes the whole cluster of guests to stay up, detect
the missing guest, fence successfully it. The machine restarts and
rejoins the cluster.
> to the vm cluster's cluster.conf
>
> -- Lon
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: paolom.vcf
Type: text/x-vcard
Size: 298 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20071213/7602581d/attachment.vcf>
More information about the Linux-cluster
mailing list