[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[dm-devel] Re: [PATCH] Remove softlockup from invalidate_mapping_pages. (might be dm related)



"Steinar H. Gunderson" <sgunderson bigfoot com> wrote:
>
> On Sat, May 13, 2006 at 07:51:47AM -0700, Andrew Morton wrote:
> > ho-hum.  Please see if there's anything else you can do to rule out a
> > hardware failure, then copy dm-devel redhat com on the next oops report.
> 
> It's not swap related. It crashes even without swap enabled; still with lots of
> dm stuff in the backtrace, though:

(for dm-devel: 2.6.15.4 runs OK on this machine with the same config) (yes?)

> [ 3192.568880] general protection fault: 0000 [1] SMP 
> [ 3192.573779] CPU 1 
> [ 3192.575804] Modules linked in: w83627hf_wdt eeprom ide_generic ide_disk ide_cd cdrom ipv6 psmouse i2c_nforce2 serio_raw pcspkr i2c_core parport_pc parport rtc ext3 jbd mbcache raid6 raid5 xor raid10 raid1 raid0 linear md_mod dm_mod sd_mod sata_nv tg3 sata_sil libata scsi_mod forcedeth generic amd74xx ehci_hcd ide_core ohc i_hcd thermal processor fan unix
> [ 3192.607472] Pid: 3432, comm: md1_raid5 Not tainted 2.6.17-rc4 #1
> [ 3192.613471] RIP: 0010:[<ffffffff803a1ae8>] <ffffffff803a1ae8>{__lock_text_start+0}
> [ 3192.620870] RSP: 0018:ffff81000245bd70  EFLAGS: 00210086
> [ 3192.626374] RAX: 000000000000fa40 RBX: aaaa8b5ad269b80f RCX: ffff81000151ff50
> [ 3192.633501] RDX: ffff81007febd600 RSI: ffff81000151fef0 RDI: aaaa8b5ad269b81f
> [ 3192.640628] RBP: 000000000000fa40 R08: ffff81007db6d2c0 R09: ffff81007db6d2c0
> [ 3192.647756] R10: 0000000000000007 R11: ffffffff8024c868 R12: ffff81007febf040
> [ 3192.654882] R13: 0000000000200296 R14: ffff810004a97fb0 R15: 0000000000000000
> [ 3192.662009] FS:  0000000000000000(0000) GS:ffff81007f827840(0000) knlGS:00000000f7ad9ae0
> [ 3192.670101] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 3192.675843] CR2: 00000000080850e0 CR3: 000000007c11e000 CR4: 00000000000006e0
> [ 3192.682970] Process md1_raid5 (pid: 3432, threadinfo ffff81007ddb6000, task ffff81007f997080)
> [ 3192.691495] Stack: ffffffff802668b8 ffff81007f27c880 ffff810049bb23c0 ffff810049bb23c0 
> [ 3192.699353]        0000000000000000 ffff81007d0788b8 ffff810049bb23c0 0000000000000000 
> [ 3192.707404]        ffffffff880d3b67 ffff81007ed0cba8 
> [ 3192.712476] Call Trace: <IRQ> <ffffffff802668b8>{kmem_cache_free+186}
> [ 3192.718950]        <ffffffff880d3b67>{:dm_mod:clone_endio+135} <ffffffff802c9372>{__end_that_request_first+420}
> [ 3192.729081]        <ffffffff802c7d1b>{blk_run_queue+62} <ffffffff8806f8a6>{:scsi_mod:scsi_end_request+40}
> [ 3192.738700]        <ffffffff8806fb51>{:scsi_mod:scsi_io_completion+522}
> [ 3192.745334]        <ffffffff880cc4a1>{:sd_mod:sd_rw_intr+623} <ffffffff880705d6>{:scsi_mod:scsi_device_unbusy+85}
> [ 3192.755641]        <ffffffff802c86cb>{blk_done_softirq+113} <ffffffff8022c41b>{__do_softirq+86}
> [ 3192.764377]        <ffffffff8020a742>{call_softirq+30} <ffffffff8020b902>{do_softirq+44}
> [ 3192.772509]        <ffffffff8020b947>{do_IRQ+65} <ffffffff80209aa0>{ret_from_intr+0} <EOI>
> [ 3192.780822]        <ffffffff881129a7>{:raid5:compute_parity+880} <ffffffff802d5e2f>{memcmp+11}
> [ 3192.789476]        <ffffffff8811467a>{:raid5:handle_stripe+3022} <ffffffff80238a7c>{keventd_create_kthread+0}
> [ 3192.799424]        <ffffffff80238a7c>{keventd_create_kthread+0} <ffffffff881153e9>{:raid5:raid5d+333}
> [ 3192.808682]        <ffffffff880e464f>{:md_mod:md_thread+0} <ffffffff880e4751>{:md_mod:md_thread+258}
> [ 3192.817864]        <ffffffff80238e78>{autoremove_wake_function+0} <ffffffff880e464f>{:md_mod:md_thread+0}
> [ 3192.827471]        <ffffffff80238cc4>{kthread+203} <ffffffff8020a3f2>{child_rip+8}
> [ 3192.835081]        <ffffffff80238a7c>{keventd_create_kthread+0} <ffffffff80238bf9>{kthread+0}
> [ 3192.843646]        <ffffffff8020a3ea>{child_rip+0}
> [ 3192.848646] 
> [ 3192.848647] Code: f0 ff 0f 0f 88 c8 01 00 00 c3 f0 ff 0f 8b 07 ba 01 00 00 00 
> [ 3192.857563] RIP <ffffffff803a1ae8>{__lock_text_start+0} RSP <ffff81000245bd70>
> [ 3192.865039]  <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
> [ 3192.872164]  <0>Rebooting in 60 seconds..
> 
> (Thank goodness for serial console; I couldn't possibly write all these oopses
> by hand. :-) )
> 
> > The stack backtrace you have there is a little surprising.  Enabling
> > CONFIG_FRAME_POINTER might help clear it up.  Also it'd be worth seeing if
> > CONFIG_DEBUG_SLAB turns up anything.
> 
> I'm recompiling 2.6.17-rc4 now with those two added in. I'll let you know in a
> few hours when it crashes again, I'd guess :-)
> 
> Would it be a good idea to revert your mm patch and test again, just in case?

Which patch?  remove-softlockup-from-invalidate_mapping_pages.patch?  No,
that won't have caused this.  But then, it's not really obvious what this
crash is.

Now your earlier trace had this important info:

 [ 1127.842645] Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP: 
 [ 1127.848117] <ffffffff803a1ae8>{__lock_text_start+0}
 [ 1127.855474] PGD 5e38a067 PUD 5e39d067 PMD 0 
 [ 1127.859770] Oops: 0002 [1] SMP 
 [ 1127.862931] CPU 1 
 

Which kind of implies that we passed a null (or very small small) `struct
kmem_cache' pointer into kmem_cache_free().  But that doesn't seem like the
sort of thing which will take hours to reproduce.

Do you have CONFIG_NUMA set?


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]