[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] killproc annoyance




On Apr 19, 2007, at 11:37 AM, Tarun Reddy wrote:


On Apr 19, 2007, at 6:31 AM, Scott McClanahan wrote:

According to the link at the bottom, a stop argument to an already
stopped service should return a success just like a start on an already started service should be interpreted as a success. So I have rewritten
most of my init scripts which are managed by clurgd to be spec
compliant.  Unfortunately, how do you really know if the service is
stopped?  How can you be certain the pid file got written or is even
correct (race conditions)?  Anyway, I have written init scripts for
apache, tomcat, and IBM MQ to name a few and have tested the hell out of
them so let me know if you want a working example.

http://refspecs.freestandards.org/LSB_2.0.1/LSB-Core/LSB-Core/ iniscrptact.html

On Wed, 2007-04-18 at 19:41 -0400, Tarun Reddy wrote:
So just started working with RH4's clustering services and have run
into a bit of a "deadlock" problem that I'm trying to see if anyone
else has seen/fixed.

1) Start off with working config, add httpd as a clustered service,
and every thing is great. Fails over to other machines great.

2) Mess up the apache config (like adding a virtual IP that doesn't
exist on the system). Even though configtest works, we have a broken
config.

3) So you restart apache without knowing the config is bad, while the
clustering service is running. Apache doesn't come back up. Okay,
cool, well go fix the problem and try to tell clustering to restart
the service.

Here is where things get annoying.
4) Now clustering says the service is failed. So it attempts to
"service httpd stop" which killproc in /etc/init.d/functions returns
a 1 since it wasn't running before. This causes the clustering
software to fail the stop, and hence leave the service in a failed
state. I can't get httpd up without the virtual IPs that are
associated to the service, so I can't get killproc to ever return a 0
when stopping the service. Shouldn't killproc return a 0 if none of
the httpd daemons are still running?

I guess for now, I'll try and force some aliases for the IPs, get
httpd up and running, disable the service, remove the aliases, and
then enable the service. Lots of stuff to do if I was in a crisis
mode in production.

Anyone have an opinion on killproc return codes?

Scott,

Thank you very much for the link! I thought I wasn't crazy. So some more testing shows that on RHEL5, /etc/init.d/httpd stop when apache is stopped, does the right thing and has RETVAL of 0, while RHEL4 is "broken" in this respect.

I think I'll look at where the differences are and possibly integrate the change back.

Thanks,
Tarun


For future reference, RH clearly saw the change is a violation of LSB and change the following in killproc between RHEL4 and RHEL5

        else
            failure $"$base shutdown"
            RC=1
        fi

changed to:

        else
                if [ -n "${LSB:-}" -a -n "$killlevel" ]; then
                        RC=7 # Program is not running
                else
                        failure $"$base shutdown"
                        RC=0
                fi
        fi

I think I will change it to return RC=0 and hope nothing else breaks :-)

Tarun


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]