[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: hwclock can cause system lockup



2008/10/17 Todd Denniston <Todd Denniston ssa crane navy mil>
Mikkel L. Ellertson wrote, On 10/16/2008 05:23 PM:
What are you trying to do with this cron job? You are updating the
system clock from the hardware clock, and not the other way around,
as you say you are trying to do. The system does synchronize the
hardware clock to the system clock on shutdown.

Not if you are sane enough to disable that in the halt script.
(search this or the fedora-test list for ntp and me to see why I say this)I would suggest two things:

Strangely I can't find anything, I googled "ntp denniston site:www.redhat.com/archives/fedora-list/" and the same for fedora-test-list but no results were found. 

1) see if punching the calls up to .5Hz or 1Hz instead of .3Hz gets it

Pretty much the same behaviour, it runs for a period of time before the system locks up. The idea of running it every 3 seconds was simply to accelerate my investigations. Normally it would run once per hour and the lockup would occur anything from once per day to once in a couple of weeks. Fortunately I have quite a few test machines I can use. They all do it, some more than others. There are maybe 3 or 4 different motherboards in use from different manufacturers so I'd be surprised if it's a problem specific to a certain clock chip (but by coincidence they may have the same one).
 
2) booting in runlevel 3 and running the script again and see if it gets you the error in a few hours, hopefully this time with an OOPS or Panic message.
 
I did this but I didn't see any kernel messages, just the locked up screen. It only took 30-45 minutes when running at 1 second intervals.

If any of these ends up being again 'just over an hour before it locked up' it might be some interaction with another cron job... did you disable the hourly cron job first?  if not I would set your 3 second script and a 2 minute cron and see if it may be a '2 accesses at the same time' problem.

In all these tests I did disable the cron job. I can't see anything else that could be interacting, only the normal system services are running. There are a few of the standard daily cron jobs enabled but the system fails at times when none of these jobs are running so I'm confident they can be ruled out.
 
race conditions in time, oh what fun.

Oh yes, I'm having the time of my life. :) Thanks for the suggestions.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]