[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[lvm-devel] [PATCH lvconvert 1/2] Fix resume/suspend ordering after temporary mirror insertion



This patch is an updated version of the following:
https://www.redhat.com/archives/lvm-devel/2008-January/msg00134.html

There is a small window during updating the in-kernel dm tables
for stacked LV that the upper device and the lower device have
idential active mappings.
In the current LVM2 features, only lvconvert will suffer from
this problem when adding mirror image(s) to mirror LV.
Attached patch works around the lvconvert problem.


Details are below.

When updating a structure of active LV,
LVM2 preloads new dm table for each device from bottom to top,
then suspend top-down and resume bottom-up.
The preloading includes resuming of lower device so that
a new table for upper device can see the attributes of the
new lower device (i.e. new size).

The point is that the resuming of the lower device happens
before the suspending of the upper device.
If the new table of the lower device and the old table of the
upper device were same and the table contains a target with
side-effect after resume (i.e. mirror and snapshot),
it causes a problem.

In the current LVM2 code, the problem can only occur when
lvconvert adds mirrors to existing mirror.
dev_manager_preload() can check CONVERTING flag in lv->status
to see whether a layer LV is inserted or not.
If inserted, it skips preloading and let the resume code handle it.


Below, I'm trying to explain what's happening using the 'dmsetup ls --tree'
output during "lvconvert adds 1 mirror to 2-way mirrored LV".

lvconvert will change the device tree as follows:

1. Before lvconvert
    vg-lvol0 (253:4)
     |-vg-lvol0_mimage_1 (253:3)
     |-vg-lvol0_mimage_0 (253:2)
     `-vg-lvol0_mlog (253:1)

2. During lvconvert
    vg-lvol0 (253:4)
     |-vg-lvol0_mimage_2 (253:6)
     `-vg-lvol0_mimagetmp_2 (253:5)
        |-vg-lvol0_mimage_1 (253:3)
        |-vg-lvol0_mimage_0 (253:2)
        `-vg-lvol0_mlog (253:1)

3. After lvconvert
    vg-lvol0 (253:4)
     |-vg-lvol0_mimage_2 (253:6)
     |-vg-lvol0_mimage_1 (253:3)
     |-vg-lvol0_mimage_0 (253:2)
     `-vg-lvol0_mlog (253:1)


While moving from the stage 1 to the stage 2,
lvconvert will create a LV 'vg-lvol0_mimage_2' as a new mirror image 
and a layer 'vg-lvol0_mimagetmp_2' to hold the original mirror map:

    vg-lvol0_mimage_2 (253:6)

    vg-lvol0_mimagetmp_2 (253:5)
     |-vg-lvol0_mimage_1 (253:3)
     |-vg-lvol0_mimage_0 (253:2)
     `-vg-lvol0_mlog (253:1)

And vg-lvol0 will mirror them:

    vg-lvol0 (253:4)
     |-vg-lvol0_mimage_2 (253:6)
     `-vg-lvol0_mimagetmp_2 (253:5)

device-mapper operations for the above is actually as follows:
(excerpt from lvconvert-bad.log)

#libdm-deptree.c:1470     Loading vg-lvol0_mimagetmp_2 table
#libdm-deptree.c:1421         Adding target: 0 4096 mirror disk 3 253:1 1024 block_on_error 2 253:2 0 253:3 0
#libdm-deptree.c:897     Resuming vg-lvol0_mimagetmp_2 (253:5)
                               ^^^^HERE
#libdm-deptree.c:1470     Loading vg-lvol0_mimage_2 table
#libdm-deptree.c:1421         Adding target: 0 4096 linear 8:49 384
#libdm-deptree.c:897     Resuming vg-lvol0_mimage_2 (253:6)
#libdm-deptree.c:1470     Loading vg-lvol0 table
#libdm-deptree.c:1421         Adding target: 0 4096 mirror core 2 1024 block_on_error 2 253:5 0 253:6 0
#libdm-deptree.c:940     Suspending vg-lvol0 (253:4)
#libdm-deptree.c:940     Suspending vg-lvol0_mimage_1 (253:3)
#libdm-deptree.c:940     Suspending vg-lvol0_mimage_0 (253:2)
#libdm-deptree.c:940     Suspending vg-lvol0_mlog (253:1)
#libdm-deptree.c:1470     Loading vg-lvol0_mimage_1 table
#libdm-deptree.c:1421         Adding target: 0 4096 linear 8:33 384
#libdm-deptree.c:897     Resuming vg-lvol0_mimage_1 (253:3)
#libdm-deptree.c:1470     Loading vg-lvol0_mimage_0 table
#libdm-deptree.c:1421         Adding target: 0 4096 linear 8:65 384
#libdm-deptree.c:897     Resuming vg-lvol0_mimage_0 (253:2)
#libdm-deptree.c:1470     Loading vg-lvol0_mlog table
#libdm-deptree.c:1421         Adding target: 0 4096 linear 8:34 384
#libdm-deptree.c:897     Resuming vg-lvol0_mlog (253:1)
#libdm-deptree.c:1470     Loading vg-lvol0_mimagetmp_2 table
#libdm-deptree.c:1421         Adding target: 0 4096 mirror disk 3 253:1 1024 block_on_error 2 253:2 0 253:3 0
#libdm-deptree.c:1470     Loading vg-lvol0_mimage_2 table
#libdm-deptree.c:1421         Adding target: 0 4096 linear 8:49 384
#libdm-deptree.c:897     Resuming vg-lvol0 (253:4)

Note that at the line commented with "HERE" above,
both vg-lvol0 and vg-lvol0_mimagetmp_2 are active and
having the same structure:

    vg-lvol0 (253:4)
     |-vg-lvol0_mimage_1 (253:3)
     |-vg-lvol0_mimage_0 (253:2)
     `-vg-lvol0_mlog (253:1)

It happens because the preloading is done before suspending.
Attached patch disables preloading if CONVERTING is on.


With the patch, vg-lvol0 is suspended first.
So the operations look like this: (excerpt from lvconvert-good.log)

#libdm-deptree.c:940     Suspending vg-lvol0 (253:4)
#libdm-deptree.c:940     Suspending vg-lvol0_mimage_1 (253:3)
#libdm-deptree.c:940     Suspending vg-lvol0_mimage_0 (253:2)
#libdm-deptree.c:940     Suspending vg-lvol0_mlog (253:1)
#libdm-deptree.c:1470     Loading vg-lvol0_mimage_1 table
#libdm-deptree.c:1421         Adding target: 0 4096 linear 8:33 384
#libdm-deptree.c:897     Resuming vg-lvol0_mimage_1 (253:3)
#libdm-deptree.c:1470     Loading vg-lvol0_mimage_0 table
#libdm-deptree.c:1421         Adding target: 0 4096 linear 8:65 384
#libdm-deptree.c:897     Resuming vg-lvol0_mimage_0 (253:2)
#libdm-deptree.c:1470     Loading vg-lvol0_mlog table
#libdm-deptree.c:1421         Adding target: 0 4096 linear 8:34 384
#libdm-deptree.c:897     Resuming vg-lvol0_mlog (253:1)
#libdm-deptree.c:1470     Loading vg-lvol0_mimagetmp_2 table
#libdm-deptree.c:1421         Adding target: 0 4096 mirror disk 3 253:1 1024 block_on_error 2 253:2 0 253:3 0
#libdm-deptree.c:897     Resuming vg-lvol0_mimagetmp_2 (253:5)
#libdm-deptree.c:1470     Loading vg-lvol0_mimage_2 table
#libdm-deptree.c:1421         Adding target: 0 4096 linear 8:49 384
#libdm-deptree.c:897     Resuming vg-lvol0_mimage_2 (253:6)
#libdm-deptree.c:1470     Loading vg-lvol0 table
#libdm-deptree.c:1421         Adding target: 0 4096 mirror core 2 1024 block_on_error 2 253:5 0 253:6 0
#libdm-deptree.c:897     Resuming vg-lvol0 (253:4)


Thanks,
-- 
Jun'ichi Nomura, NEC Corporation of America

Current status:
  - suspend_lv() in LVM2 preloads dm tables before suspending.
    It's necessary because the table loading could require memory
    while kernel may not able to allocate memory without writing
    to the device.
  - The preloading involves resuming, if the device has a parent
    node in the deptree.
    It's necessary because, for example, the new device size needs
    to be visible to load the parent table.
  - Preloading is bottom-up. Suspending is top-down.

What's the problem:
  - If a layer LV is inserted, suspend_lv() after that will
    resume the layer LV temporarily before the parent LV is
    suspended.
    It creates a window where both the parent LV and the layer LV
    have the same active table.

A workaround:
  - Don't preload (and resume) in suspend_lv() if a layer LV is just inserted.
    Following resume_lv() will load tables appropriately.

Changes from the previous version:
https://www.redhat.com/archives/lvm-devel/2008-January/msg00134.html
  - Explicit activate_lv() in lvconvert is not necessary,
    as resume_lv() will have the same effect.

Index: LVM2.work/lib/activate/dev_manager.c
===================================================================
--- LVM2.work.orig/lib/activate/dev_manager.c
+++ LVM2.work/lib/activate/dev_manager.c
@@ -1121,6 +1121,14 @@ int dev_manager_preload(struct dev_manag
 	if ((lv->status & PVMOVE) || (lv->status & LOCKED))
 		return 1;
 
+	/* FIXME:
+	 * Preloading involves activation of the inserted layer LV.
+	 * As a result, there is a window that 2 active dm devices having
+	 * the same table.
+	 * So skip preloading for now and let the caller cope with that. */
+	if ((lv->status & CONVERTING))
+		return 1;
+
 	return _tree_action(dm, lv, PRELOAD);
 }
 

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]