[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Need help with Reboot cause



PS = Pete Stieber
PS>> running the latest x86_64 Fedora 10 kernel that I
PS>> recently loaded (April 2).  The machine reboots
PS>> without warning.

g> where you previously running 32 bit?

No. I've been running x86_64 since I first assembled the machine back in September 2004. I started with Fedora Core 2.

g> how soon after loading f10 64 bit did problem start?

I noticed it a few times before the latest kernel, but now it is more frequent.

g> is there any consistency in reboot, that is, how often?

I'm starting to notice a pattern that makes me think I should look at cron entries. Here is the frequency of reboot from a previous post...

Reboots indicated by information in /var/log/messages...

Sunday    March 29   4:08
Tuesday   March 31   7:02
Thursday  April  2  18:27 Intentional reboot due to new kernel
Friday    April  3   1:36
Sunday    April  5   1:37
Sunday    April  5   2:48
Sunday    April  5   9:43
Sunday    April  5  13:20 as I was typing this email

The only recent hardware change was the addition of a Belkin OmniView PRO2 4-Port KVM switch (F1DA104T). I removed this device and performed the action (my nightly builds) that seems to cause the reboot with distcc turned off, a samba share I normally have setup disabled, and all of the cluster nodes turned off.

No reboot.

Next I cleaned everything, enabled distcc, turned on the cluster node, and reran the build.

No reboot.

Next I cleaned everything, enabled the samba share, and reran the build.

No reboot.

I cleaned everything and went to bed. I left top running of a remote terminal so I could tell what process was running during the reboot.

The machine rebooted :-( The time was 1:43 (very similar to other reboot times in my list above.

So it wasn't the Belkin KVM switch.

The top command indicates ld was running. This was the case for 3 other reboots (see my prior posts)...

top - 01:42:20 up 15:10,  3 users,  load average: 1.94, 2.73, 2.53
Tasks: 130 total,   2 running, 128 sleeping,   0 stopped,   0 zombie
Cpu(s): 5.0%us, 3.5%sy, 0.0%ni, 90.9%id, 0.5%wa, 0.0%hi, 0.2%si, 0.0%st
Mem:   2060232k total,  1594892k used,   465340k free,    47772k buffers
Swap:  2031608k total,    31256k used,  2000352k free,  1289104k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
27913 pstieber  20   0 90404  80m  952 R 15.3  4.0   0:00.46 ld
 6616 pstieber  20   0 14880 1204  872 R  0.3  0.1   0:49.58 top
27801 pstieber  20   0 83076 1140  740 S  0.3  0.1   0:00.01 make
    1 root      20   0  4096  492  368 S  0.0  0.0   0:00.61 init
    2 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      RT  -5     0    0    0 S  0.0  0.0   0:00.58 migration/0


Result of last | grep reboot
reboot   system boot  2.6.27.21-170.2. Wed Apr  8 01:43         (05:08)
reboot   system boot  2.6.27.21-170.2. Tue Apr  7 10:32         (20:19)
reboot   system boot  2.6.27.21-170.2. Tue Apr  7 07:09         (02:03)
reboot   system boot  2.6.27.21-170.2. Mon Apr  6 06:56       (1+02:16)
reboot   system boot  2.6.27.21-170.2. Sun Apr  5 13:20       (1+19:52)
reboot   system boot  2.6.27.21-170.2. Sun Apr  5 09:43       (1+23:29)
reboot   system boot  2.6.27.21-170.2. Sun Apr  5 01:36       (2+07:36)
reboot   system boot  2.6.27.21-170.2. Fri Apr  3 01:36       (4+07:36)
reboot   system boot  2.6.27.21-170.2. Thu Apr  2 18:52       (4+14:20)
reboot   system boot  2.6.27.19-170.2. Tue Mar 31 07:02       (2+11:48)
reboot   system boot  2.6.27.19-170.2. Tue Mar 24 08:42       (9+10:07)
reboot   system boot  2.6.27.19-170.2. Mon Mar 23 06:35      (10+12:14)

Result of last | grep crash
pstieber pts/2      172.16.1.16      Tue Apr  7 19:27 - crash  (06:15)
pstieber tty1                        Tue Apr  7 13:15 - crash  (12:27)
pstieber pts/0      192.168.120.51   Tue Apr  7 06:55 - crash  (00:13)
pstieber pts/0      192.168.120.51   Mon Apr  6 06:36 - crash  (00:19)
pstieber pts/2      172.16.1.16      Sun Apr  5 13:46 - crash  (17:09)
root     pts/5      192.168.120.51   Sun Apr  5 13:00 - crash  (00:20)
pstieber pts/3      192.168.120.51   Sun Apr  5 12:58 - crash  (00:22)
pstieber pts/4      172.16.1.16      Sun Apr  5 12:56 - crash  (00:24)
nalshura pts/2      172.21.0.9       Sun Apr  5 12:49 - crash  (00:31)
nalshura pts/1      172.21.0.9       Sun Apr  5 11:29 - crash  (01:51)
nalshura pts/0      172.21.0.9       Sun Apr  5 10:39 - crash  (02:41)
pstieber pts/0      192.168.120.51   Sun Apr  5 09:21 - crash  (00:21)
ctrott   pts/3      172.16.1.141     Fri Apr  3 12:12 - crash (1+13:24)
ctrott   pts/0      172.16.1.141     Fri Apr  3 10:06 - crash (1+15:30)
root     pts/0      172.16.1.16      Tue Mar 31 06:59 - crash  (00:03)
root     pts/0      mrburns.toyon.co Tue Mar 24 08:38 - crash  (00:04)
root     pts/0      192.168.120.51   Mon Mar 23 06:33 - crash  (00:02)


g> if reboot time is short, have you tried booting just
g> to bios or boot prompt?

No. The machine will run for a long period of time and it seems to be related to my build process and the ld command in particular.

g> in and along lines of psu and mainboard, have you added any hardware?

The Belking KVM is all I can think of.

g> what is make and model of psu and mainboard?

PSU: ANTEC TRU550EPS12V ATX
MB: Thunder K8W (S2885ANRF)

g> check web for load ratings of mainboard, drives and any added
g> cards to see how close you are to rating of psu.
g>
g> in direction of psu and mainboard, 'bad cap syndrome', do you
g> have hardware that you can remove to see if you are having psu
g> load problem?
g>
g> like drives that can be removed, cd/dvd drive, nic, audio,
g> change to a light weight video?

I removed a DVD drive and a floppy recently because I never used them. I did add a new SATA drive a few moths ago...

g> pull all removable hardware and boot to bios to see if will stay
g> up. if ok, replace cards one by one. then add drives one by one.
g>
g> if you pull hardware and still have reboots, then you may well
g> have 'bad caps'.
g>
g> a quick test to save pulling hardware and you are a tinkerer,
g> get an automobile tail/brake light and connect to psu +12v and
g> +5v to test loading.

Thanks for the ideas. This machine and the attached cluster is used by a group of 10 or so at my company. It's difficult to do a lot of tinkering, but I can use the argument that if it reboots, what good is it.

Thanks for the suggestions,
Pete


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]