Are there any known problems with using bio_clone() in a DM target module when the underlying block device is an LVM logical volume and a snapshot has been created on said logical volume while the DM target module is active?
I’m developing a DM target module to do block-based change tracking. It includes code from dm-linear, Alberto Bertogli’s dm-csum, dm-delay, and some original code, and it appears to be fairly functional. With the underlying device (the one in the table entry) being an LVM logical volume. I have created an ext2/3 filesystem, copied all ~775MB of /usr/lib to it, and diff’ed the original /usr/lib with the copy. My module includes the parts of dm-csum that call bio_clone() to clone the original bio into one or more pieces to convert ‘evil’ bios into ‘nice’ ones and then to convert an original ‘nice’ bio into the one that will be used in a bio_submit() call.
However, if I create an LVM snapshot of the underlying LV while my module is active, and then I write to the device created by my module, after my module has called bio_submit() for a metadata write to sector 256 and a data write to a higher-numbered sector, some portions of my write requests do not complete, and I see the following error messages in /var/log/syslog:
__ratelimit: 33 callbacks suppressed
Buffer I/O error on device dm-1, logical block 256
lost page write due to I/O error on dm-1
If I use dm-linear, dm-delay, or dm-crypt instead of my module, there are no errors reported, and the writes appear to go through. The main difference appears to be my use (from dm-csum) of bio_clone(). However, my module, even with bio_clone(), appears to work fine except when an LVM snapshot is created underneath it.
Any ideas on whether bio_clone() and LVM snapshots don’t play well together?
Azad Consultant at Intel