[Linux-cluster] killproc annoyance

Fri Apr 20 13:59:10 UTC 2007

--- Tarun Reddy <treddy at rallydev.com> wrote:

> 
> On Apr 19, 2007, at 6:31 AM, Scott McClanahan wrote:
> 
> > According to the link at the bottom, a stop
> argument to an already
> > stopped service should return a success just like
> a start on an  
> > already
> > started service should be interpreted as a
> success.  So I have  
> > rewritten
> > most of my init scripts which are managed by
> clurgd to be spec
> > compliant.  Unfortunately, how do you really know
> if the service is
> > stopped?  How can you be certain the pid file got
> written or is even
> > correct (race conditions)?  Anyway, I have written
> init scripts for
> > apache, tomcat, and IBM MQ to name a few and have
> tested the hell  
> > out of
> > them so let me know if you want a working example.
> >
> >
>
http://refspecs.freestandards.org/LSB_2.0.1/LSB-Core/LSB-Core/
> 
> > iniscrptact.html
> >
> > On Wed, 2007-04-18 at 19:41 -0400, Tarun Reddy
> wrote:
> >> So just started working with RH4's clustering
> services and have run
> >> into a bit of a "deadlock" problem that I'm
> trying to see if anyone
> >> else has seen/fixed.
> >>
> >> 1) Start off with working config, add httpd as a
> clustered service,
> >> and every thing is great. Fails over to other
> machines great.
> >>
> >> 2) Mess up the apache config (like adding a
> virtual IP that doesn't
> >> exist on the system). Even though configtest
> works, we have a broken
> >> config.
> >>
> >> 3) So you restart apache without knowing the
> config is bad, while the
> >> clustering service is running. Apache doesn't
> come back up. Okay,
> >> cool, well go fix the problem and try to tell
> clustering to restart
> >> the service.
> >>
> >> Here is where things get annoying.
> >> 4) Now clustering says the service is failed. So
> it attempts to
> >> "service httpd stop" which killproc in
> /etc/init.d/functions returns
> >> a 1 since it wasn't running before. This causes
> the clustering
> >> software to fail the stop, and hence leave the
> service in a failed
> >> state. I can't get httpd up without the virtual
> IPs that are
> >> associated to the service, so I can't get
> killproc to ever return a 0
> >> when stopping the service. Shouldn't killproc
> return a 0 if none of
> >> the httpd daemons are still running?
> >>
> >> I guess for now, I'll try and force some aliases
> for the IPs, get
> >> httpd up and running, disable the service, remove
> the aliases, and
> >> then enable the service. Lots of stuff to do if I
> was in a crisis
> >> mode in production.
> >>
> >> Anyone have an opinion on killproc return codes?
> 
> Scott,
> 
> Thank you very much for the link! I thought I wasn't
> crazy. So some  
> more testing shows that on RHEL5, /etc/init.d/httpd
> stop when apache  
> is stopped, does the right thing and has RETVAL of
> 0, while RHEL4 is  
> "broken" in this respect.
> 
> I think I'll look at where the differences are and
> possibly integrate  
> the change back.

someone already send you the link in cluster-faq where
it is explained the problem and how to fixed (in the
function killproc of initrd functions file)

it is a very simple fix, and it works for every initrd
script that use the killproc function, the ones that
not use it will still get you problems (like mysql
initrd script)

cu
roger

__________________________________________
RedHat Certified Engineer ( RHCE )
Cisco Certified Network Associate ( CCNA )

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com