[Linux-cluster] virtualization on top of Red Hat cluster

Fri Jan 23 20:43:02 UTC 2009

Gunther Schlegel wrote:
> 
> 
> Geoffrey wrote:
> 
>> Before going into the actual issues we are running into, I thought we 
>> should first find out if there are others who are doing this, or 
>> attempting to do this.
>>
>> This includes a separate virtual server for the following services: 
>> squid, ldap, mail server, dns, samba, email and a handful of 
>> Xservers/application servers.
>>
>> The specific hardware includes:
>>
>> 8 Dell 19150 nodes w/ rh5.2 xen
>> Each node has 32GB of ram and 8 cores (xeon 2.66)
> 
> you need a 64-bit distro to have xen support more than 16GB RAM.

We do.

> 
>> EMC San CX-310
>> 2 Brocade 5000 fibre switches
>>
>> Any feedback from anyone attempting or currently running a similar 
>> solution (virtualization on top of cluster) would be greatly appreciated.
> 
> We are running something like that for more than a year now (tests on 
> rhel5.0, live on rhel5.1, now running rhel5.2).
> 
> 45+ paravirtualized VMs on 2 clusters.
> VMs are located on clustered logical volumes, not gfs.
> gfs is used for /etc/xen, though.

Yes, we are using lvm as well.

> performance is good, even with io-intensive apps inside the VMs.
> stability is fair (but improving over time).

This is good to hear.

> The main issue has always been the cluster loosing quorum without 
> apparent reason. Improved after we added a Quorum Disk.

We are using Quorum disk as well.

> Improved further 
> after RH Support recommended a couple of not so well documented ;) 
> configuration parameters, which also cannot be maintained with 
> conga/luci/ricci or system-config-cluster. Improved even further after 
> we complained again and RH came up with even more undocumented setting 
> to solve the race-conditions we experienced when a node left the cluster 
> (even intentionally).

Any possible way you can share these undocumented settings with us?

> Another big issue was live migration, turned out that the bridge in Dom0 
>  has a default forward delay of 15 seconds. We may have experienced that 
> because we use a different xen network-script than delivered by RH. 
> Though, the original one could not deal with neither bonding nor Vlans, 
> so we had to do that. This has changed according to the rhel5.3 
> changelog, but we have not tested it yet.
> 
> We also had issues with the DomU time jumping around after live 
> migration, but RH fixed that recently and as a workaround one could run 
> ntpd inside the DomU.
> 
> 
> best regards, Gunther
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin