[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: ext3, S/W RAID-5 and many services



Stephen C. Tweedie wrote:

>Hi,
>
>On Fri, Jan 18, 2002 at 04:15:28PM +0900, P. Fleury wrote:
>
>
>>I use ext3 over Software RAID-5, and access this through Samba/NFS/HTTP.
>> From time to time, the machine  hangs,  no response to any kind of
>>input (ping does not respond, nor keyboard/mouse). Only hard-reset does
>>the trick.
>>
>>I also notices that 2 of the 7 disks are in UDMA 33, the others in UDMA
>>100. Does this have any impact ? (besides performance)
>>
>>If I do not mount the ext3 partition, it runs fine. Any help ?
>>
>
>Can you trap kernel log output, in case there's an oops being
>reported?  If you have a text-mode console, you may have to copy it
>down by hand.  If not, it is possible to set up a serial console and
>record the kernel output on another machine.
>
>Cheers,
> Stephen
>

Well, this time I got something. The sequence was:
- start machine, use it for 1/2 day, access it via NFS, HTTP, IMAP (3
concurrent sites) and Samba.
- After a while, machine load goes up, login impossible even on console.
After an hour, I could login as root.
- tried to reboot, to no avail. After 1 hour of waiting, tried 'telinit
6'. Then, remotely, nothing more possible.
- The machine did not reboot, says /dev/md cannot be unmounted, it is busy.
- hard reset.
- RAID-5 resync running for a while, then:

Jan 25 10:35:21 lafleur syslogd 1.4.1: restart.
Jan 25 11:26:38 lafleur kernel: Unable to handle kernel paging request
at virtual address 493dd238
Jan 25 11:26:38 lafleur kernel: printing eip:
Jan 25 11:26:38 lafleur kernel: f083eff2
Jan 25 11:26:38 lafleur kernel: *pde = 00000000
Jan 25 11:26:38 lafleur kernel: Oops: 0002
Jan 25 11:26:38 lafleur kernel: CPU: 0
Jan 25 11:26:38 lafleur kernel: EIP:
0010:[3c59x:__insmod_3c59x_O/lib/modules/2.4.9-13/kernel/drivers/net/3c+-1388558/96]


Not tainted
Jan 25 11:26:38 lafleur kernel: EIP: 0010:[<f083eff2>] Not tainted
Jan 25 11:26:38 lafleur kernel: EFLAGS: 00010216
Jan 25 11:26:38 lafleur kernel: eax: 00000000 ebx: 00001000 ecx:
00000400 edx: 00000000
Jan 25 11:26:38 lafleur kernel: esi: 00000018 edi: 493dd238 ebp:
00000007 esp: efb19e58
Jan 25 11:26:38 lafleur kernel: ds: 0018 es: 0018 ss: 0018
Jan 25 11:26:38 lafleur kernel: Process raid5d (pid: 19, stackpage=efb19000)
Jan 25 11:26:38 lafleur kernel: Stack: ef861804 00001000 c017c6ad
c033ce80 00000282 00000282 00000003 c1f6f908
Jan 25 11:26:38 lafleur kernel: c21d6400 00000000 00000007
00000000 00000001 00000004 f083ffd8 ef861800
Jan 25 11:26:38 lafleur kernel: 00000002 c01871dd 00000246
c033ce40 0000000c 0000007c fffffffc fffffff4
Jan 25 11:26:38 lafleur kernel: Call Trace:
[generic_make_request+241/256] generic_make_request [kernel] 0xf1
Jan 25 11:26:38 lafleur kernel: Call Trace: [<c017c6ad>]
generic_make_request [kernel] 0xf1
Jan 25 11:26:38 lafleur kernel:
[3c59x:__insmod_3c59x_O/lib/modules/2.4.9-13/kernel/drivers/net/3c+-1384488/96]


__insmod_raid5_S.text_L13736 [raid5] 0x1f78
Jan 25 11:26:38 lafleur kernel: [<f083ffd8>]
__insmod_raid5_S.text_L13736 [raid5] 0x1f78
Jan 25 11:26:38 lafleur kernel: [ide_set_handler+85/92] ide_set_handler
[kernel] 0x55
Jan 25 11:26:38 lafleur kernel: [<c01871dd>] ide_set_handler [kernel] 0x55
Jan 25 11:26:38 lafleur kernel: [ide_dma_intr+0/156] ide_dma_intr
[kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [<c0190a3c>] ide_dma_intr [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [dma_timer_expiry+0/100]
dma_timer_expiry [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [<c019114c>] dma_timer_expiry [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [do_IRQ+144/156] do_IRQ [kernel] 0x90
Jan 25 11:26:38 lafleur kernel: [<c0108110>] do_IRQ [kernel] 0x90
Jan 25 11:26:38 lafleur kernel:
[3c59x:__insmod_3c59x_O/lib/modules/2.4.9-13/kernel/drivers/net/3c+-1382890/96]


device_bsize [raid5] 0x222
Jan 25 11:26:38 lafleur kernel: [<f0840616>] device_bsize [raid5] 0x222
Jan 25 11:26:38 lafleur kernel: [md_thread+212/308] md_thread [kernel] 0xd4
Jan 25 11:26:38 lafleur kernel: [<c01b1454>] md_thread [kernel] 0xd4
Jan 25 11:26:38 lafleur kernel: [kernel_thread+38/48] kernel_thread
[kernel] 0x26
Jan 25 11:26:38 lafleur kernel: [<c010566e>] kernel_thread [kernel] 0x26
Jan 25 11:26:38 lafleur kernel: [md_thread+0/308] md_thread [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [<c01b1380>] md_thread [kernel] 0x0
Jan 25 11:26:38 lafleur kernel:
Jan 25 11:26:38 lafleur kernel:
Jan 25 11:26:38 lafleur kernel: Code: f3 ab f6 c3 02 74 02 66 ab f6 c3
01 74 01 aa 8b 14 24 8d 5d


After this, trying reboot says umount2 has problems, MD thread is being interrupted after the message 'Wait while the system is restarting' but nothing happens.

Is there a way to spend less than 30 minutes per day baby-sitting my
server ?

--Pascal






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]