[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[lvm-devel] [PATCH] Fix mirror corruption during primary device failure.



 brassow

When down converting mirrors (e.g. going from a 3-leg to 2-leg mirror),
removable legs are pushed to the end of the array via swapping with the
last element.

Example:
- Mirror consists of devices A, B, C; and we wish to remove A
- A is first swapped with C, leaving C, B, A
- The leg count is reduced and A is removed, leaving C, B

The above works fine in most cases.  However, if there is a failure
of the primary device (the first device), the kernel selects the next
leg as the primary and continues.  While there is a failed device,
the kernel will only write to the primary.

Revisiting the above example:
- Mirror consists of devices A, B, C
- A fails, leaving a, B, C
- kernel selects B as the new primary.
- Performing the above conversion will cause a 2-way mirror to
  be put in place with THE WRONG PRIMARY, C.

The scenario causes all writes performed between the time of failure
and the conversion to be lost - causing corruption of file systems
and loss of data.

This patch preserves the ordering of devices when moving 'removable
pvs' to the end of the array.  So, rather than having:
	1) A, B, C (starting mirror)
	2) C, B, A (reordering legs)
	3) C, B	   (converted mirror)
we have:
        1) A, B, C (starting mirror)
        2) B, C, A (reordering legs)
        3) B, C    (converted mirror)

Index: LVM2-rhel5/lib/metadata/mirror.c
===================================================================
--- LVM2-rhel5.orig/lib/metadata/mirror.c
+++ LVM2-rhel5/lib/metadata/mirror.c
@@ -136,6 +136,53 @@ uint32_t adjusted_mirror_region_size(uin
 }
 
 /*
+ * shift_mirror_legs
+ * @mirrored_seg
+ * @leg_pos:  The position (index) of the leg to move to the end
+ *
+ * When dealing with removal of legs, we often move a 'removable leg'
+ * to the back of the 'areas' array.  It is critically important not
+ * to simply swap it for the last area in the array.  This would have
+ * the affect of reordering the remaining legs - altering position of
+ * the primary.  So, we must shuffle all of the areas in the array
+ * to maintain their relative position before moving the 'removable
+ * leg' to the end.
+ *
+ * Short illustration of the problem:
+ *   - Mirror consists of legs A, B, C and we want to remove A
+ *   - We swap A and C and then remove A, leaving C, B
+ * This scenario is problematic in failure cases where A dies, because
+ * B becomes the primary.  If the above happens, we effectively throw
+ * away any changes made between the time of failure and the time of
+ * restructuring the mirror.
+ *
+ * So, any time we want to move areas to the end to be removed, use
+ * this function.
+ *
+ * Returns: 0 on success, 1 on failure
+ */
+static int shift_mirror_legs(struct lv_segment *mirrored_seg, int leg_pos)
+{
+	int i;
+	struct lv_segment_area area;
+
+
+	if (leg_pos >= mirrored_seg->area_count)
+		return 1; /* -EINVAL */
+
+	area = mirrored_seg->areas[leg_pos];
+
+	/* Shift everyone down to fill the hole */
+	for (i = leg_pos+1; i < mirrored_seg->area_count; i++)
+		mirrored_seg->areas[i-1] = mirrored_seg->areas[i];
+
+	/* Stick this one at the end */
+	mirrored_seg->areas[i-1] = area;
+
+	return 0;
+}
+
+/*
  * This function writes a new header to the mirror log header to the lv
  *
  * Returns: 1 on success, 0 on failure
@@ -469,13 +516,12 @@ static int _remove_mirror_images(struct 
 		for (s = 0; s < mirrored_seg->area_count &&
 			    old_area_count - new_area_count < num_removed; s++) {
 			sub_lv = seg_lv(mirrored_seg, s);
+
 			if (!is_temporary_mirror_layer(sub_lv) &&
 			    _is_mirror_image_removable(sub_lv, removable_pvs)) {
-				/* Swap segment to end */
+				if (shift_mirror_legs(mirrored_seg, s))
+					return 0;
 				new_area_count--;
-				area = mirrored_seg->areas[new_area_count];
-				mirrored_seg->areas[new_area_count] = mirrored_seg->areas[s];
-				mirrored_seg->areas[s] = area;
 			}
 		}
 		if (num_removed && old_area_count == new_area_count)



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]