[Linux-cluster] EL5.6/RHCS2; rgmanager doesn't migrate VM after too many crashes
Joe DiTommasso
jdito at dca.net
Wed Apr 20 16:58:12 UTC 2011
On Wed, Apr 20, 2011 at 11:55:03AM -0400, Digimer wrote:
> Hi all,
>
> I've got a RHCS2 cluster on el5.6 using rgmanager to manage Xen domUs.
> I've got the <vm ... /> service set to restart on failure with a a
> maximum restart of 2 and a restart recovery time of 600 seconds.
>
> <vm name="vm0001_lz1" domain="cc48_primary"
> path="/xen_shared/definitions/" autostart="0" exclusive="0"
> recovery="restart" max_restarts="2" restart_expire_time="600"/>
>
> I test by killing the VM using 'echo c > /proc/sysrq-trigger' multiple
> times well within 10 minutes, and the cluster does recover the VM every
> time, but always on the node it was previously running in. The failover
> domain is:
>
> <failoverdomain name="cc48_primary" nofailback="0" ordered="1"
> restricted="1">
> <failoverdomainnode name="cc0048.iplink.net" priority="1"/>
> <failoverdomainnode name="cc0049.iplink.net" priority="2"/>
> </failoverdomain>
>
> Any idea what I am doing wrong?
>
> Thanks!
>
> --
> Digimer
> E-Mail: digimer at alteeve.com
> AN!Whitepapers: http://alteeve.com
> Node Assassin: http://nodeassassin.org
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
Try changing from recovery="restart" to recovery="relocate". Here's how we have it set up for one of our clusters:
<service autostart="1" domain="mysqld" name="MYSQLD" recovery="relocate">
I believe restart uses the node the service was running on prior to the failure by design.
Joe
More information about the Linux-cluster
mailing list