[Linux-cluster] Adding a stop timeout to a VM service using 'ccs'
Digimer
lists at alteeve.ca
Thu Mar 20 01:26:56 UTC 2014
On 19/03/14 07:45 PM, Digimer wrote:
> On 19/03/14 06:31 PM, Chris Feist wrote:
>> On 03/18/2014 08:27 PM, Digimer wrote:
>>> Hi all,
>>>
>>> I would like to tell rgmanager to give more time for VMs to stop. I
>>> want this:
>>>
>>> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
>>> path="/shared/definitions/" exclusive="0" recovery="restart"
>>> max_restarts="2"
>>> restart_expire_time="600">
>>> <action name="stop" timeout="10m" />
>>> </vm>
>>>
>>> I already use ccs to create the entry:
>>>
>>> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
>>> path="/shared/definitions/" exclusive="0" recovery="restart"
>>> max_restarts="2"
>>> restart_expire_time="600"/>
>>>
>>> via:
>>>
>>> ccs -h localhost --activate --sync --password "secret" \
>>> --addvm vm01-win2008 \
>>> --domain="primary_n01" \
>>> path="/shared/definitions/" \
>>> autostart="0" \
>>> exclusive="0" \
>>> recovery="restart" \
>>> max_restarts="2" \
>>> restart_expire_time="600"
>>>
>>> I'm hoping it's a simple additional switch. :)
>>
>> Unfortunately currently ccs doesn't support setting resource actions.
>> However it's my understanding that rgmanager doesn't check timeouts
>> unless __enforce_timeouts is set to "1". So you shouldn't be seeing a
>> vm resource go to failed if it takes a long time to stop. Are you
>> trying to make the vm resource fail if it takes longer than 10 minutes
>> to stop?
>
> I was afraid you were going to say that. :(
>
> The problem is that after calling 'disable' against the VM service,
> rgmanager waits two minutes. If the service isn't closed in that time,
> the server is forced off (at least, this was the behaviour when I last
> tested this).
>
> The concern is that, by default, windows installs queue updates to
> install when the system shuts down. During this time, windows makes it
> very clear that you should not power off the system during the updates.
> So if this timer is hit, and the VM is forced off, the guest OS can be
> damaged.
>
> Of course, we can debate the (lack of) wisdom of this behaviour, and I
> already document this concern (and even warn people to check for updates
> before stopping the server), it's not sufficient. If a user doesn't read
> the warning, or simply forgets to check, the consequences can be
> non-trivial.
>
> If ccs can't be made to add this attribute, and if the behaviour
> persists (I will test shortly after sending this reply), then I will
> have to edit the cluster.conf directly, something I am loath to do if at
> all avoidable.
>
> Cheers
Confirmed;
I called disable on a VM with gnome running, so that I could abort the
VM's shut down.
an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date
Wed Mar 19 21:06:29 EDT 2014
Local machine disabling vm:vm01-rhel6...Success
Wed Mar 19 21:08:36 EDT 2014
2 minutes and 7 seconds, then rgmanager forced-off the VM. Had this been
a windows guest in the middle of installing updates, it would be highly
likely to be screwed now.
To confirm, I changed the config to:
<vm autostart="0" domain="primary_n01" exclusive="0" max_restarts="2"
name="vm01-rhel6" path="/shared/definitions/" recovery="restart"
restart_expire_time="600">
<action name="stop" timeout="10m"/>
</vm>
Then I repeated the test:
an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date
Wed Mar 19 21:13:18 EDT 2014
Local machine disabling vm:vm01-rhel6...Success
Wed Mar 19 21:23:31 EDT 2014
10 minutes and 13 seconds before the cluster killed the server, much
less likely to interrupt a in-progress OS update (truth be told, I plan
to set 30 minutes.
I understand that this blocks other processes, but in an HA environment,
I'd strongly argue that safe > speed.
digimer
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the Linux-cluster
mailing list