[dm-devel] Re: A BUG in snapshot merging

Mike Snitzer snitzer at redhat.com
Thu Sep 24 06:33:07 UTC 2009


On Wed, Sep 23 2009 at  8:07pm -0400,
Mike Snitzer <snitzer at redhat.com> wrote:

> Mikulas,
> 
> I can easily reproduce the BUG you reported (you were running on
> sparc64) if I take the following patch out of the quilt series:
> http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel_unified/2.6.31/dm-exstore-persistent-allow-metadata-reread.patch
> 
> I've made an adjusted quilt series available here:
> http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel_unified_no_reread/2.6.31/
> 
> I'll be working to sort this out but I wanted to give you a heads up
> that I can now easily reproduce the BUG on x86.  Don't even need to stop
> the merge and restart; just a normal merge triggers the BUG after the
> first extent of chunks is merged.

Turns out the BUG doesn't occur immediately after processing the first
extent of chunks (via merge_callback).

It occurs IFF the chunks are _not_ processed in descending order, e.g.:

...
start  merge chunk=29939 linear_chunks=160
finish merge chunk=29939
finish merge chunk=29938
finish merge chunk=29937
finish merge chunk=29936
...
finish merge chunk=29784
finish merge chunk=29783
finish merge chunk=29782
finish merge chunk=29781
finish merge chunk=29780
start  merge chunk=29736 linear_chunks=3
finish merge chunk=29736
finish merge chunk=29735
finish merge chunk=29734
start  merge chunk=29779 linear_chunks=43
finish merge chunk=29779
------------[ cut here ]------------
kernel BUG at drivers/md/dm-snap-persistent.c:456!
...

So in the above you see chunks 29734-29736 gets interleaved between
29779 and 29780.

This provided the hint I needed to fix the fact that when we moved
dm-snapshot-dont-insert-before-existing-chunk.patch to the beginning of
the series  we neglected to adjust
dm-snapshot-move-exception-code-to-new-file.patch accordingly.

As a result dm-snapshot-move-exception-code-to-new-file.patch had
reintroduced inserting the exceptions into the hash_table before other
exceptions..

Mikulas/Jon, I'd really appreciate it if you could test the following
updated quilt tree(s):

http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel_unified/2.6.31/
http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel_unified_no_reread/2.6.31/

Mikulas, the 'kernel_unified_no_reread' quilt tree is slightly more
minimalist (avoids adding re-read support to dm-snap-persistent.c) so it
is worth a shot if 'kernel_unified' still fails for you.

Both work for me on x86_64.  But we may not be out of the woods on
sparc64.

Thanks,
Mike




More information about the dm-devel mailing list