Reboots

Mon Oct 20 01:33:40 UTC 2003

On Sun, 2003-10-19 at 17:25, Ola Thoresen wrote:
> This might not be the right list, but hopefully some of the right people will
> read it anyway.
> 
> I am currently working for a company with several linux servers, most of them
> running redhat.  From time to time these will need a reboot. We have had
> nfsd-processes hanging around that never dies and eventually causes other
> processes to hang. We have had httpd-processes refusing to die and so on.
> This is not our main concern, as this is not very common - maybe once a year
> for each server - but with 50-60 servers this might turn out to be a reboot a 
> week.  The reality is not that bad, but from time to time a reboot must be
> issued.
> 
> The main issue about a reboot is not the 5 ot 10 min. downtime when the server 
> boots, but the fact that quite often, the reboot will not complete.
> When we reboot, it is because all other ways of resolving an issue is tested
> and have failed, and then, quite often, init 6 will not work either.
> Killall fails, so it hangs forever. The server does not manage to turn of swap, 
> some disk can not be umounted or other such issues.
> 
> There should be some sort of a timer that will _really_ kill the system if
> init 6 has not completed in N minutes.
> We can live with 10 minutes wait before the system reboots, but some of our
> servers are located in rooms and buildings where it is more or less
> impossible to get access out of business-hours, and we can not always live
> with it offline for a whole weekend.
> 
> This might be part of the discussion about faster shutdowns, but just
> please remember that certain reboots might be as important.
> 
> 
> Rgds.
> 
> Ola Thoresen

Are you running Red Hat?  What version?  We have a bunch of Red Hat AS
2.1 servers and they never go down.  Do you have a competent admin for
your Linux servers?  You shouldn't be having these kinds of problems at
all.  You could have a few cron jobs that run at say 2:00 AM that stop
nfs and apache and use killall -9 to make sure that the processes are
dead and then start them back up.  This may help from having issues ever
show up.  However, I think you should still look at how the servers are
setup since you should not be having these types of problems in the
first place, at least in my experience.

Jim Drabb
-- 
James Drabb
Senior Programmer Analyst
Davenport, FL USA