[Linux-cluster] suggestion on freeze-on-node1 and unfreeze-on-node2 approach?

Fri Jan 8 17:05:24 UTC 2010

On Fri, 2010-01-08 at 16:12 +0100, Gianluca Cecchi wrote:

> If I get a network problem and my vip goes down for more than 30
> seconds (that should be default interval between checks), it will
> cause a relocation of the whole service and not a try-restart of only
> the vip, correct?

Yes.

> Without the recovery="relocate" option would this imply again that the
> whole service would be restarted  in the running node in this network
> problem scenario (so shutdown abort of DB, umount of the FS and
> restart of all of them)?

Yes.

> If this is true, could the modification below prevent this (I don't
> care about fs, because if one of them goes down, probably I have
> problems impacting DB itself, so it is safer to stop it....) for the
> general case
> 
>                  <service autostart="1" name="ACSSRV" >
>                         <ip ref="10.4.5.123" __independent_subtree="1" />
>                         <fs ref="oradata"/>
>                         <fs ref="orasave"/>
>                         <fs ref="rdoffline"/>
>                         <fs ref="appl"/>
>                         <script ref="ACS"/>
>                 </service>

Exactly.

> And what about this one, does it make sense at all if I add the
> recovery=relocate policy?
> 
>                  <service autostart="1" name="ACSSRV" recovery="relocate">
>                         <ip ref="10.4.5.123" __independent_subtree="1" />
>                         <fs ref="oradata"/>
>                         <fs ref="orasave"/>
>                         <fs ref="rdoffline"/>
>                         <fs ref="appl"/>
>                         <script ref="ACS"/>
>                 </service>

Independent subtree means that the IP address failure is not considered
a service failure unless restarting it fails.  (There's a RFE open to
effectively extend 'max_restarts' and 'restart_expire_time' to
individual resources, but it has not been implemented.)

So, if ip fails, just ip is restarted.

If oradata, orasave, ACS, appl, or rdoffline fail, the service is
relocated.

-- Lon