[dm-devel] Calltrace in dm-snapshot in 2.6.27 kernel

aluno3 at poczta.onet.pl aluno3 at poczta.onet.pl
Mon Oct 20 06:23:26 UTC 2008


Hi,

I have some problems with device mapper in 2.6.27 kernel. Below there is
 calltrace from logs:


---------------
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: [<0000000000000000>] 0x0
PGD 5a84c067 PUD 5cfdb067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: iscsi_trgt drbd bonding iscsi_tcp libiscsi
scsi_transport_iscsi megaraid_mbox megaraid_mm sky2 skge button ftdi_sio
usbserial
Pid: 31704, comm: kcopyd Not tainted 2.6.27 #7
RIP: 0010:[<0000000000000000>]  [<0000000000000000>] 0x0
RSP: 0000:ffff880055af5d18  EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88007dfe3128 RCX: 010000000000059d
RDX: 0000000000000018 RSI: 8000000000000000 RDI: ffff88007dfe3128
RBP: ffff88007dfe33c8 R08: ffffc20005f751d0 R09: 00ffffffffffffff
R10: 0100000000000000 R11: 0000000000000000 R12: ffff880014d8dc00
R13: 0000000000000000 R14: ffff880059c89840 R15: ffff880014d8dd18
FS:  0000000000000000(0000) GS:ffffffff808dea80(0000)
knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000007d0d0000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Process kcopyd (pid: 31704, threadinfo ffff880055af4000, task
ffff880026e3ce70)
Stack:  ffffffff805c2ea4 00000000ffffff7e 00000000000000ee ffff88004c268440
 0000000000000000 ffff880026d58eb8 0000000000000400 0000000000000000
 ffffffff805c4140 0000000000001d5a 00000000000005b8 ffff880001025af0
Call Trace:
 [<ffffffff805c2ea4>] ? pending_complete+0x1e4/0x220
 [<ffffffff805c4140>] ? persistent_commit+0x100/0x130
 [<ffffffff805bd8a3>] ? segment_complete+0x183/0x1c0
 [<ffffffff805bd720>] ? segment_complete+0x0/0x1c0
 [<ffffffff805bd385>] ? run_complete_job+0x65/0xb0
 [<ffffffff805bd320>] ? run_complete_job+0x0/0xb0
 [<ffffffff805bd5d6>] ? process_jobs+0x26/0xe0
 [<ffffffff805bd690>] ? do_work+0x0/0x60
 [<ffffffff805bd6b8>] ? do_work+0x28/0x60
 [<ffffffff8024686a>] ? run_workqueue+0x5a/0x110
 [<ffffffff802469bc>] ? worker_thread+0x9c/0xf0
 [<ffffffff8024a620>] ? autoremove_wake_function+0x0/0x30
 [<ffffffff8024a620>] ? autoremove_wake_function+0x0/0x30
 [<ffffffff80246920>] ? worker_thread+0x0/0xf0
 [<ffffffff80249f0c>] ? kthread+0x6c/0xa0
 [<ffffffff8020d1c9>] ? child_rip+0xa/0x11
 [<ffffffff8021b5f0>] ? lapic_next_event+0x0/0x10
 [<ffffffff80249ea0>] ? kthread+0x0/0xa0
 [<ffffffff8020d1bf>] ? child_rip+0x0/0x11


Code:  Bad RIP value.
RIP  [<0000000000000000>] 0x0
 RSP <ffff880055af5d18>
CR2: 0000000000000000
---------------

I've got this calltrace from our QA team. They say that they mad few
snapshots, run several programs like bacula or rsync and that calltrace
is appearing about 1 hour after starting those programs.

We didn't recognize the reason of this calltrace so far. I mean we don't
know which of these programs can cause this calltrace.

I investigate a little this calltrace on my own. That what I know is
NULL value of "free" pointer (in mempool_t structure) while calling
mempool_free().

Here there is trace of procedures call:

(...) -> put_pending_exception():841 -> free_pending_exception() ->
mempool_free()

The mempool_free() calls:

pool->free(element, pool->pool_data)

and here pool->free is NULL, so it causes calltrace.

This is the description of the problem.

Is this known problem? Is there any solution for fixing it?
Any suggestions?




More information about the dm-devel mailing list