[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: ext3, S/W RAID-5 and many services



Stephen C. Tweedie wrote:

Hi,

On Fri, Jan 18, 2002 at 04:15:28PM +0900, P. Fleury wrote:


I use ext3 over Software RAID-5, and access this through Samba/NFS/HTTP. From time to time, the machine hangs, no response to any kind of input (ping does not respond, nor keyboard/mouse). Only hard-reset does the trick.

I also notices that 2 of the 7 disks are in UDMA 33, the others in UDMA 100. Does this have any impact ? (besides performance)

If I do not mount the ext3 partition, it runs fine. Any help ?


Can you trap kernel log output, in case there's an oops being reported? If you have a text-mode console, you may have to copy it down by hand. If not, it is possible to set up a serial console and record the kernel output on another machine.

Cheers,
Stephen


Well, this time I got something. The sequence was:
- start machine, use it for 1/2 day, access it via NFS, HTTP, IMAP (3 concurrent sites) and Samba.
- After a while, machine load goes up, login impossible even on console. After an hour, I could login as root.
- tried to reboot, to no avail. After 1 hour of waiting, tried 'telinit 6'. Then, remotely, nothing more possible.
- The machine did not reboot, says /dev/md cannot be unmounted, it is busy.
- hard reset.
- RAID-5 resync running for a while, then:


Jan 25 10:35:21 lafleur syslogd 1.4.1: restart.
Jan 25 11:26:38 lafleur kernel: Unable to handle kernel paging request at virtual address 493dd238
Jan 25 11:26:38 lafleur kernel: printing eip:
Jan 25 11:26:38 lafleur kernel: f083eff2
Jan 25 11:26:38 lafleur kernel: *pde = 00000000
Jan 25 11:26:38 lafleur kernel: Oops: 0002
Jan 25 11:26:38 lafleur kernel: CPU: 0
Jan 25 11:26:38 lafleur kernel: EIP: 0010:[3c59x:__insmod_3c59x_O/lib/modules/2.4.9-13/kernel/drivers/net/3c+-1388558/96] Not tainted
Jan 25 11:26:38 lafleur kernel: EIP: 0010:[<f083eff2>] Not tainted
Jan 25 11:26:38 lafleur kernel: EFLAGS: 00010216
Jan 25 11:26:38 lafleur kernel: eax: 00000000 ebx: 00001000 ecx: 00000400 edx: 00000000
Jan 25 11:26:38 lafleur kernel: esi: 00000018 edi: 493dd238 ebp: 00000007 esp: efb19e58
Jan 25 11:26:38 lafleur kernel: ds: 0018 es: 0018 ss: 0018
Jan 25 11:26:38 lafleur kernel: Process raid5d (pid: 19, stackpage=efb19000)
Jan 25 11:26:38 lafleur kernel: Stack: ef861804 00001000 c017c6ad c033ce80 00000282 00000282 00000003 c1f6f908
Jan 25 11:26:38 lafleur kernel: c21d6400 00000000 00000007 00000000 00000001 00000004 f083ffd8 ef861800
Jan 25 11:26:38 lafleur kernel: 00000002 c01871dd 00000246 c033ce40 0000000c 0000007c fffffffc fffffff4
Jan 25 11:26:38 lafleur kernel: Call Trace: [generic_make_request+241/256] generic_make_request [kernel] 0xf1
Jan 25 11:26:38 lafleur kernel: Call Trace: [<c017c6ad>] generic_make_request [kernel] 0xf1
Jan 25 11:26:38 lafleur kernel: [3c59x:__insmod_3c59x_O/lib/modules/2.4.9-13/kernel/drivers/net/3c+-1384488/96] __insmod_raid5_S.text_L13736 [raid5] 0x1f78
Jan 25 11:26:38 lafleur kernel: [<f083ffd8>] __insmod_raid5_S.text_L13736 [raid5] 0x1f78
Jan 25 11:26:38 lafleur kernel: [ide_set_handler+85/92] ide_set_handler [kernel] 0x55
Jan 25 11:26:38 lafleur kernel: [<c01871dd>] ide_set_handler [kernel] 0x55
Jan 25 11:26:38 lafleur kernel: [ide_dma_intr+0/156] ide_dma_intr [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [<c0190a3c>] ide_dma_intr [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [dma_timer_expiry+0/100] dma_timer_expiry [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [<c019114c>] dma_timer_expiry [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [do_IRQ+144/156] do_IRQ [kernel] 0x90
Jan 25 11:26:38 lafleur kernel: [<c0108110>] do_IRQ [kernel] 0x90
Jan 25 11:26:38 lafleur kernel: [3c59x:__insmod_3c59x_O/lib/modules/2.4.9-13/kernel/drivers/net/3c+-1382890/96] device_bsize [raid5] 0x222
Jan 25 11:26:38 lafleur kernel: [<f0840616>] device_bsize [raid5] 0x222
Jan 25 11:26:38 lafleur kernel: [md_thread+212/308] md_thread [kernel] 0xd4
Jan 25 11:26:38 lafleur kernel: [<c01b1454>] md_thread [kernel] 0xd4
Jan 25 11:26:38 lafleur kernel: [kernel_thread+38/48] kernel_thread [kernel] 0x26
Jan 25 11:26:38 lafleur kernel: [<c010566e>] kernel_thread [kernel] 0x26
Jan 25 11:26:38 lafleur kernel: [md_thread+0/308] md_thread [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [<c01b1380>] md_thread [kernel] 0x0
Jan 25 11:26:38 lafleur kernel:
Jan 25 11:26:38 lafleur kernel:
Jan 25 11:26:38 lafleur kernel: Code: f3 ab f6 c3 02 74 02 66 ab f6 c3 01 74 01 aa 8b 14 24 8d 5d



After this, trying reboot says umount2 has problems, MD thread is being interrupted after the message 'Wait while the system is restarting' but nothing happens.


Is there a way to spend less than 30 minutes per day baby-sitting my server ?

--Pascal





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]