[linux-lvm] lvm hangup on snapshot overflow

Fri Aug 12 16:05:31 UTC 2011

Hello everyone!

Well, the issue first – lvm hangs up on sync command after snapshot
overflow.

How to reproduce the problem

You can do that with the script – test.sh, which is in the attachments. It
may appear rather big, but that's primarily due to debug messages - in fact
It's quite simple. First it creates physical volume on a chosen physical
disk, creates volume group and 2 logical volumes. One of them is the
original LV that we write data to and the other one reserved for a snapshot.
Later it mounts the original volume, converts the second LV to a snapshot
and writes data to the origin LV in the amount that would make the next
snapshot overflow. Then sync is executed. Afterwards another logical volume
is created and converted to snapshot. sync. This sync hangs up.

It is advised to perform tests in virtual environment, because besides other
reasons, you won't be able to reboot normally. When you run the script for
the next time after a reboot it will take care of the old stuff – the
required commands are at the very beginning of the script.

 And this is what we have so far

We started off here: *
http://www.redhat.com/archives/dm-devel/2011-May/msg00059.html*, but after a
bunch of tests came to a conclusion that it is neither the kernel version,
nor its configuration or file system that has an impact on hangup. By now we
know that this issue occurs on all versions of lvm past 2.02.56 (2.02.57
fails). An interesting fact is that when we built the most verbose version
of kernel possible (meaning the amount of kernel logs) and the system became
real slow the newer version (2.02.57), that had previously hung up, -
passed! Based on this we think there might be an overrun present that leads
to a deadlock.

For now there are two basic errors:

--------------------

lvconvert device-mapper: suspend ioctl failed: Input/output error

lvconvert Unable to suspend VG-sn_x (252:3)

lvconvert Failed to suspend origin lv

------- and --------

LV VG/sn_x in use: not deactivating

Couldn't deactivate LV sn_x

--------------------

The first one always precedes the hang up, while the second one doesn't
appear every time, but always comes first of the two and can appear multiple
times before the first error. In both cases _lock_vol. returns 0.

As of the second error. The function lvconvert_snapshot fails, reporting
“Couldn't deactivate LV sn_x”, because info.open_count is not equal to zero.
That's indicated by “LV VG/sn_x in use: not deactivating” error. The value
of info.open_count is clearly set to 1 with the lv_info function, but seems
to be never cleared - the value of info.open_count is set to the value of a
field, stored in dm_ioctl struct, which is a member of dm_task struct, but I
couldn't find were it is assigned.

Things get much more complicated due to inability to use a debugger, so an
attending question would be – how do you properly build lvm to get debugging
symbols on? Right now lvm wouldn't build with debug symbols even though
configuration script is provided with appropriate option and it's proved to
be applied, while building (configuration log says it's on and the
corresponding option (-g) is added to the list of flags, passed to gcc).

 Attachment description

It the attached archive you will find the following files:

kernel_logs - kernel logs after each tool invocation, retrieved by dmesg -c.

lvm.conf - lvm configuration file that we have used

lvm2.log - lvm logs with debug level set to 7

output_logs - 2 versions: neat and verbose. The difference is that verbose
contains commands performed (set -x)

test.sh - the main test-script

remove.sh - a portion of test.sh responsible for cleanup (sometimes
convenient to have separate)

 We continue to study the problem, but any help or guidance from people, how
are familiar with the structure and code of lvm would be highly appreciated.
Thanks a lot!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20110812/f3208afb/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lvm-test.tar.gz
Type: application/x-gzip
Size: 28599 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20110812/f3208afb/attachment.bin>