[linux-lvm] Snapshot causing segault

Mon Dec 31 18:50:55 UTC 2012

Hello everyone,
     I've been having an intermittent problem on random servers segfaulting
while trying to create a snapshot under version  lvm2-2.02.17-7.38.3 on
kernel 2.6.16.60-0.93.1-bigsmp (SLES 10 SP4). The messages I get are:
###########################################
Dec 27 07:45:39 chelco-app-01 kernel: Unable to handle kernel NULL pointer
dereference at virtual address 0000001c
Dec 27 07:45:39 chelco-app-01 kernel:  printing eip:
Dec 27 07:45:39 chelco-app-01 kernel: f90ab3a7
Dec 27 07:45:39 chelco-app-01 kernel: *pde = 3780a001
Dec 27 07:45:39 chelco-app-01 kernel: Oops: 0000 [#1]
Dec 27 07:45:39 chelco-app-01 kernel: SMP
Dec 27 07:45:39 chelco-app-01 kernel: last sysfs file:
/devices/pci0000:00/0000:00:02.0/0000:04:00.1/irq
Dec 27 07:45:39 chelco-app-01 kernel: Modules linked in: raw dock button
battery ac loop dm_snapshot usbhid dm_mod uhci_hcd bnx2x hw_random ehci_hcd
qla2xxx hpilo usbcore firmware_class scsi_transport_fc parport_pc lp
parport ext3 jbd edd
fan thermal processor cciss sd_mod scsi_mod
Dec 27 07:45:39 chelco-app-01 kernel: CPU:    4
Dec 27 07:45:39 chelco-app-01 kernel: EIP:    0060:[<f90ab3a7>]    Tainted:
G     X VLI
Dec 27 07:45:39 chelco-app-01 kernel: EFLAGS: 00210202
(2.6.16.60-0.93.1-bigsmp #1)
Dec 27 07:45:39 chelco-app-01 kernel: EIP is at __map_bio+0x50/0x11f
[dm_mod]
Dec 27 07:45:39 chelco-app-01 kernel: eax: f90960c4   ebx: 00000000   ecx:
f7ff2a60   edx: f7794440
Dec 27 07:45:39 chelco-app-01 kernel: esi: f7ff2a58   edi: f90960c4   ebp:
f46306c0   esp: f4c15d28
Dec 27 07:45:39 chelco-app-01 kernel: ds: 007b   es: 007b   ss: 0068
Dec 27 07:45:39 chelco-app-01 kernel: Process lvcreate (pid: 6678,
threadinfo=f4c14000 task=f7838680)
Dec 27 07:45:39 chelco-app-01 kernel: Stack: <0>f7794340 f7794440 f7794440
03201ff0 00000000 03201ff0 00000000 00000008
Dec 27 07:45:39 chelco-app-01 kernel:        00000000 00000000 f90960c4
f7ff2a68 f46306c0 f90abd1b 00000000 00000001
Dec 27 07:45:39 chelco-app-01 kernel:        00000008 f428e2e0 fcdfe010
ffffffff c0113d62 00000000 0000001f f7ff2a58
Dec 27 07:45:39 chelco-app-01 kernel: Call Trace:
Dec 27 07:45:39 chelco-app-01 kernel:  [<f90abd1b>] __split_bio+0x182/0x440
[dm_mod]
Dec 27 07:45:39 chelco-app-01 kernel:  [<c0113d62>]
do_flush_tlb_all+0x0/0x5d
Dec 27 07:45:39 chelco-app-01 kernel:  [<f90abff0>]
__flush_deferred_io+0x17/0x20 [dm_mod]
Dec 27 07:45:39 chelco-app-01 kernel:  [<f90ac14c>] dm_resume+0x8e/0xf9
[dm_mod]
Dec 27 07:45:39 chelco-app-01 kernel:  [<f90aedd8>] dev_suspend+0x138/0x157
[dm_mod]
Dec 27 07:45:39 chelco-app-01 kernel:  [<f90af607>] ctl_ioctl+0x220/0x26e
[dm_mod]
Dec 27 07:45:39 chelco-app-01 kernel:  [<f90aeca0>] dev_suspend+0x0/0x157
[dm_mod]
Dec 27 07:45:39 chelco-app-01 kernel:  [<c0179ce8>] do_ioctl+0x48/0x5e
Dec 27 07:45:39 chelco-app-01 kernel:  [<c0179f60>] vfs_ioctl+0x262/0x275
Dec 27 07:45:39 chelco-app-01 kernel:  [<c0179fc7>] sys_ioctl+0x54/0x6d
Dec 27 07:45:39 chelco-app-01 kernel:  [<c0103dcb>]
sysenter_past_esp+0x54/0x79
Dec 27 07:45:39 chelco-app-01 kernel: Code: b4 0a f9 89 70 40 8b 06 83 c0
0c f0 ff 00 8b 54 24 08 8d 4e 08 8b 02 8b 52 04 89 44 24 0c 89 f8 89 54 24
10 8b 5f 04 8b 54 24 08 <ff> 53 1c 83 f8 00 89 c2 0f 8e 93 00 00 00 8b 54
24 08 8b 42 0c
#############################################################

The result is the target volume gets suspended and the only way to fix it
is to reboot and remove the faulty snapshot when it comes back up.

Now the script I wrote that creates these snapshots will use all available
extents from the Volume Group pool which in this case was actually larger
than the size of the volume I was trying to snapshot. Thinking this was the
problem, I tried creating the snapshot several times using a snapshot size
less than or equal to the target volume and it worked every time. So, I
tried a value larger than the target to generate a crash and it did BUT not
every time. In fact now I can't get it to segfault at all.

So my question is: is creating the snapshot volume with a size larger than
the target volume inducing segfaults randomly or could there be another
problem lurking? If these weren't production machines I would normally just
go with a size smaller than the target but I really need to be sure what
exactly is causing the segfaults.

Any help would be appreciated.

  -Tyler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20121231/0943cbd6/attachment.htm>