[Linux-cluster] Not restarting "max_restart" times before relocating failed service

Tue Oct 30 15:55:55 UTC 2012

On 10/30/2012 01:54 AM, Parvez Shaikh wrote:
> Hi experts,
> 
> I have defined a service as follows in cluster.conf -
> 
>                 <service autostart="0" domain="mydomain" exclusive="0"
> max_restarts="5" name="mgmt" recovery="restart">
>                         <script ref="myHaAgent"/>
>                         <ip ref="192.168.51.51"/>
>                 </service>
> 
> I mentioned max_restarts=5 hoping that if cluster fails to start service
> 5 times, then it will relocate to another cluster node in failover domain.
> 
> To check this, I turned down NIC hosting service's floating IP and got
> following logs -
> 
> Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1: Not
> detected
> Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
> Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
> Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip
> "192.168.51.51" returned 1 (generic error)
> Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service
> service:mgmt
> *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
> recovering*
> Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed
> service service:mgmt
> Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip
> "192.168.51.51" returned 1 (generic error)
> Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to start
> service:mgmt; return value: 1
> Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service
> service:mgmt
> *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
> recovering
> Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating failed
> service service:mgmt*
> Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
> stopped
> Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
> stopped
> 
> But from the log it appears that cluster tried to restart service only
> ONCE before relocating.
> 
> I was expecting cluster to retry starting this service five times on the
> same node before relocating
> 
> Can anybody correct my understanding?
> 
> Thanks,
> Parvez

What version? Please paste your full cluster.conf.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?