[Date Prev][Date Next] [Thread Prev][Thread Next]
Re: Need help with Reboot cause
- From: "Peter J. Stieber" <developer toyon com>
- To: "Community assistance, encouragement, and advice for using Fedora." <fedora-list redhat com>
- Subject: Re: Need help with Reboot cause
- Date: Wed, 08 Apr 2009 07:08:47 -0700
PS = Pete Stieber
PS>> running the latest x86_64 Fedora 10 kernel that I
PS>> recently loaded (April 2). The machine reboots
PS>> without warning.
g> where you previously running 32 bit?
No. I've been running x86_64 since I first assembled the machine back
in September 2004. I started with Fedora Core 2.
g> how soon after loading f10 64 bit did problem start?
I noticed it a few times before the latest kernel, but now it is more
g> is there any consistency in reboot, that is, how often?
I'm starting to notice a pattern that makes me think I should look at
cron entries. Here is the frequency of reboot from a previous post...
Reboots indicated by information in /var/log/messages...
Sunday March 29 4:08
Tuesday March 31 7:02
Thursday April 2 18:27 Intentional reboot due to new kernel
Friday April 3 1:36
Sunday April 5 1:37
Sunday April 5 2:48
Sunday April 5 9:43
Sunday April 5 13:20 as I was typing this email
The only recent hardware change was the addition of a Belkin OmniView
PRO2 4-Port KVM switch (F1DA104T). I removed this device and performed
the action (my nightly builds) that seems to cause the reboot with
distcc turned off, a samba share I normally have setup disabled, and all
of the cluster nodes turned off.
Next I cleaned everything, enabled distcc, turned on the cluster node,
and reran the build.
Next I cleaned everything, enabled the samba share, and reran the build.
I cleaned everything and went to bed. I left top running of a remote
terminal so I could tell what process was running during the reboot.
The machine rebooted :-( The time was 1:43 (very similar to other
reboot times in my list above.
So it wasn't the Belkin KVM switch.
The top command indicates ld was running. This was the case for 3 other
reboots (see my prior posts)...
top - 01:42:20 up 15:10, 3 users, load average: 1.94, 2.73, 2.53
Tasks: 130 total, 2 running, 128 sleeping, 0 stopped, 0 zombie
Cpu(s): 5.0%us, 3.5%sy, 0.0%ni, 90.9%id, 0.5%wa, 0.0%hi, 0.2%si,
Mem: 2060232k total, 1594892k used, 465340k free, 47772k buffers
Swap: 2031608k total, 31256k used, 2000352k free, 1289104k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27913 pstieber 20 0 90404 80m 952 R 15.3 4.0 0:00.46 ld
6616 pstieber 20 0 14880 1204 872 R 0.3 0.1 0:49.58 top
27801 pstieber 20 0 83076 1140 740 S 0.3 0.1 0:00.01 make
1 root 20 0 4096 492 368 S 0.0 0.0 0:00.61 init
2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT -5 0 0 0 S 0.0 0.0 0:00.58 migration/0
Result of last | grep reboot
reboot system boot 22.214.171.124-170.2. Wed Apr 8 01:43 (05:08)
reboot system boot 126.96.36.199-170.2. Tue Apr 7 10:32 (20:19)
reboot system boot 188.8.131.52-170.2. Tue Apr 7 07:09 (02:03)
reboot system boot 184.108.40.206-170.2. Mon Apr 6 06:56 (1+02:16)
reboot system boot 220.127.116.11-170.2. Sun Apr 5 13:20 (1+19:52)
reboot system boot 18.104.22.168-170.2. Sun Apr 5 09:43 (1+23:29)
reboot system boot 22.214.171.124-170.2. Sun Apr 5 01:36 (2+07:36)
reboot system boot 126.96.36.199-170.2. Fri Apr 3 01:36 (4+07:36)
reboot system boot 188.8.131.52-170.2. Thu Apr 2 18:52 (4+14:20)
reboot system boot 184.108.40.206-170.2. Tue Mar 31 07:02 (2+11:48)
reboot system boot 220.127.116.11-170.2. Tue Mar 24 08:42 (9+10:07)
reboot system boot 18.104.22.168-170.2. Mon Mar 23 06:35 (10+12:14)
Result of last | grep crash
pstieber pts/2 172.16.1.16 Tue Apr 7 19:27 - crash (06:15)
pstieber tty1 Tue Apr 7 13:15 - crash (12:27)
pstieber pts/0 192.168.120.51 Tue Apr 7 06:55 - crash (00:13)
pstieber pts/0 192.168.120.51 Mon Apr 6 06:36 - crash (00:19)
pstieber pts/2 172.16.1.16 Sun Apr 5 13:46 - crash (17:09)
root pts/5 192.168.120.51 Sun Apr 5 13:00 - crash (00:20)
pstieber pts/3 192.168.120.51 Sun Apr 5 12:58 - crash (00:22)
pstieber pts/4 172.16.1.16 Sun Apr 5 12:56 - crash (00:24)
nalshura pts/2 172.21.0.9 Sun Apr 5 12:49 - crash (00:31)
nalshura pts/1 172.21.0.9 Sun Apr 5 11:29 - crash (01:51)
nalshura pts/0 172.21.0.9 Sun Apr 5 10:39 - crash (02:41)
pstieber pts/0 192.168.120.51 Sun Apr 5 09:21 - crash (00:21)
ctrott pts/3 172.16.1.141 Fri Apr 3 12:12 - crash (1+13:24)
ctrott pts/0 172.16.1.141 Fri Apr 3 10:06 - crash (1+15:30)
root pts/0 172.16.1.16 Tue Mar 31 06:59 - crash (00:03)
root pts/0 mrburns.toyon.co Tue Mar 24 08:38 - crash (00:04)
root pts/0 192.168.120.51 Mon Mar 23 06:33 - crash (00:02)
g> if reboot time is short, have you tried booting just
g> to bios or boot prompt?
No. The machine will run for a long period of time and it seems to be
related to my build process and the ld command in particular.
g> in and along lines of psu and mainboard, have you added any hardware?
The Belking KVM is all I can think of.
g> what is make and model of psu and mainboard?
PSU: ANTEC TRU550EPS12V ATX
MB: Thunder K8W (S2885ANRF)
g> check web for load ratings of mainboard, drives and any added
g> cards to see how close you are to rating of psu.
g> in direction of psu and mainboard, 'bad cap syndrome', do you
g> have hardware that you can remove to see if you are having psu
g> load problem?
g> like drives that can be removed, cd/dvd drive, nic, audio,
g> change to a light weight video?
I removed a DVD drive and a floppy recently because I never used them.
I did add a new SATA drive a few moths ago...
g> pull all removable hardware and boot to bios to see if will stay
g> up. if ok, replace cards one by one. then add drives one by one.
g> if you pull hardware and still have reboots, then you may well
g> have 'bad caps'.
g> a quick test to save pulling hardware and you are a tinkerer,
g> get an automobile tail/brake light and connect to psu +12v and
g> +5v to test loading.
Thanks for the ideas. This machine and the attached cluster is used by
a group of 10 or so at my company. It's difficult to do a lot of
tinkering, but I can use the argument that if it reboots, what good is it.
Thanks for the suggestions,
[Date Prev][Date Next] [Thread Prev][Thread Next]