[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Need help with Reboot cause

Peter J. Stieber wrote:

> I noticed it a few times before the latest kernel, but now it is more 
> frequent.

'more frequent' may well be indication of system component breaking down. that
is to say that if it is 'bad cap', capacitor is starting to have more internal

other things that can cause 'more frequent' is cooling problems caused by dirty
cooling fan blades, fan itself slowing from lubricate drying, dirty heat sink on
cpu or other high heat vlsi chip, including some gpu. [graphics processor unit]
also, some heat transfer paste will dry within a couple of years and cause a loss
of cooling.

i have never check, but if there is a command line for checking system temp, fan
speed, and voltages, running a cron at 10 to 15 minute intervals will give a good
idea of what is happening.

> I'm starting to notice a pattern that makes me think I should look at 
> cron entries.  Here is the frequency of reboot from a previous post...

if it is a cron running, this could be an increase in cpu usage and a heat
increase. therefore, knowing what is happening temp and fan wise would help.

> The only recent hardware change was the addition of a Belkin OmniView 
> PRO2 4-Port KVM switch (F1DA104T).

i have same kvm and have not had any problems with it. also, there would have to
be something very weird going on with it to cause a problem. something like a
shorting that would cause a drop in voltage.

> The top command indicates ld was running.  This was the case for 3 other 
> reboots (see my prior posts)...

'ld' could be a cause as it would be a cpu load. therefore you would need to
look for other systems loads. and again, knowing what is happening via 'sensors'
will show just how much load you are getting.

> Result of last | grep crash
> pstieber pts/2      Tue Apr  7 19:27 - crash  (06:15)
> pstieber tty1                        Tue Apr  7 13:15 - crash  (12:27)
> pstieber pts/0   Tue Apr  7 06:55 - crash  (00:13)
> root     pts/0      mrburns.toyon.co Tue Mar 24 08:38 - crash  (00:04)
> root     pts/0   Mon Mar 23 06:33 - crash  (00:02)

knowing what is going on just before these periods, crons, etc, would help you
find a system load.


"bad cap" AND "antec" [with "" and 'AND'] results in 308 hits on google


> MB: Thunder K8W (S2885ANRF)

"bad cap" AND "Thunder K8W" results in 6 hits.


not a very good combination. :(

> Thanks for the ideas.

you are welcome.

> This machine and the attached cluster is used by 
> a group of 10 or so at my company.  It's difficult to do a lot of 
> tinkering, but I can use the argument that if it reboots, what good is it.

that should be justification for a complete new system. justification is fact
of which is more costly, system crashing and chance of data loss, and your time,
or cost of a new box.

if they go for a new box, be sure that you have a good safety margin on rating
of power supply. max load at 70% of max output would be nice.


peace out.



in a free world without fences, who needs gates.
help microsoft stamp out piracy - give linux to a friend today
to mess up a linux box, you need to work at it;
to mess up an ms windows box, you just need to *look at* it.
learn linux:
'Rute User's Tutorial and Exposition' http://rute.2038bug.com/index.html
'The Linux Documentation Project' http://www.tldp.org/
'LDP HOWTO-index' http://www.tldp.org/HOWTO/HOWTO-INDEX/index.html
'HowtoForge' http://howtoforge.com/
'fedora faqs' http://www.fedorafaq.org/

Attachment: signature.asc
Description: OpenPGP digital signature

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]