[dm-devel] [PATCH for-dm-3.14-fixes 7/8] dm thin: ensure user takes action to validate data and metadata consistency

Mike Snitzer snitzer at redhat.com
Fri Feb 21 02:56:04 UTC 2014


If a thin metadata operation fails the current transaction will abort,
whereby causing potential for IO layers up the stack (e.g. filesystems)
to have data loss.  As such, set THIN_METADATA_NEEDS_CHECK_FLAG in the
thin metadata's superblock which forces the user to:
1) verify the thin metadata is consistent (e.g. use thin_check, etc)
2) verify the thin data is consistent (e.g. use fsck)

The only way to clear the superblock's THIN_METADATA_NEEDS_CHECK_FLAG is
to run thin_repair.

On metadata operation failure: abort current metadata transaction, set
pool in read-only mode, and now set the needs_check flag.

As part of this change, constraints are introduced or relaxed:
* don't allow a pool to transition to write mode if needs_check is set
* don't queue IO if needs_check is set
* don't allow data or metadata space to be resized if needs_check is set
* if a thin pool's metadata space is exhausted: the kernel will now
  force the user to take the pool offline for repair before the kernel
  will allow the metadata space to be extended.
* relax response to data allocation failure due to no data space:
  don't abort the current metadata transaction (this allows previously
  allocated and prepared mappings to be committed).

Signed-off-by: Mike Snitzer <snitzer at redhat.com>
---
 Documentation/device-mapper/thin-provisioning.txt | 21 +++---
 drivers/md/dm-thin-metadata.c                     | 20 +++++-
 drivers/md/dm-thin-metadata.h                     |  8 +++
 drivers/md/dm-thin.c                              | 82 +++++++++++++++++------
 4 files changed, 102 insertions(+), 29 deletions(-)

diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt
index 3989dd6..075ca84 100644
--- a/Documentation/device-mapper/thin-provisioning.txt
+++ b/Documentation/device-mapper/thin-provisioning.txt
@@ -130,14 +130,19 @@ a volatile write cache.  If power is lost you may lose some recent
 writes.  The metadata should always be consistent in spite of any crash.
 
 If data space is exhausted the pool will either error or queue IO
-according to the configuration (see: error_if_no_space).  When metadata
-space is exhausted the pool will error IO, that requires new pool block
-allocation, until the pool's metadata device is resized.  When either the
-data or metadata space is exhausted the current metadata transaction
-must be aborted.  Given that the pool will cache IO whose completion may
-have already been acknowledged to the upper IO layers (e.g. filesystem)
-it is strongly suggested that those layers perform consistency checks
-before the data or metadata space is resized after having been exhausted.
+according to the configuration (see: error_if_no_space).  If metadata
+space is exhausted or a metadata operation fails: the pool will error IO
+until the pool is taken offline and repair is performed to 1) fix any
+potential inconsistencies and 2) clear the flag that imposes repair.
+Once the pool's metadata device is repaired it may be resized, which
+will allow the pool to return to normal operation.  Note that a pool's
+data and metadata devices cannot be resized until repair is performed.
+It should also be noted that when the pool's metadata space is exhausted
+the current metadata transaction is aborted.  Given that the pool will
+cache IO whose completion may have already been acknowledged to upper IO
+layers (e.g. filesystem) it is strongly suggested that consistency
+checks (e.g. fsck) be performed on those layers when repair of the pool
+is required.
 
 Thin provisioning
 -----------------
diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
index bea6db6..7c2bc26 100644
--- a/drivers/md/dm-thin-metadata.c
+++ b/drivers/md/dm-thin-metadata.c
@@ -76,7 +76,7 @@
 
 #define THIN_SUPERBLOCK_MAGIC 27022010
 #define THIN_SUPERBLOCK_LOCATION 0
-#define THIN_VERSION 1
+#define THIN_VERSION 2
 #define THIN_METADATA_CACHE_SIZE 64
 #define SECTOR_TO_BLOCK_SHIFT 3
 
@@ -1830,3 +1830,21 @@ int dm_pool_register_metadata_threshold(struct dm_pool_metadata *pmd,
 
 	return r;
 }
+
+void dm_pool_metadata_set_needs_check(struct dm_pool_metadata *pmd)
+{
+	down_write(&pmd->root_lock);
+	pmd->flags |= THIN_METADATA_NEEDS_CHECK_FLAG;
+	up_write(&pmd->root_lock);
+}
+
+bool dm_pool_metadata_needs_check(struct dm_pool_metadata *pmd)
+{
+	bool needs_check;
+
+	down_read(&pmd->root_lock);
+	needs_check = pmd->flags & THIN_METADATA_NEEDS_CHECK_FLAG;
+	up_read(&pmd->root_lock);
+
+	return needs_check;
+}
diff --git a/drivers/md/dm-thin-metadata.h b/drivers/md/dm-thin-metadata.h
index 520dd34..583ff5d 100644
--- a/drivers/md/dm-thin-metadata.h
+++ b/drivers/md/dm-thin-metadata.h
@@ -27,6 +27,11 @@
 
 /*----------------------------------------------------------------*/
 
+/*
+ * Thin metadata superblock flags.
+ */
+#define THIN_METADATA_NEEDS_CHECK_FLAG		(1 << 0)
+
 struct dm_pool_metadata;
 struct dm_thin_device;
 
@@ -214,6 +219,9 @@ int dm_pool_register_metadata_threshold(struct dm_pool_metadata *pmd,
 					dm_sm_threshold_fn fn,
 					void *context);
 
+void dm_pool_metadata_set_needs_check(struct dm_pool_metadata *pmd);
+bool dm_pool_metadata_needs_check(struct dm_pool_metadata *pmd);
+
 /*----------------------------------------------------------------*/
 
 #endif
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index e560416..cd52cf2 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1014,6 +1014,7 @@ static bool should_error_unserviceable_bio(struct pool *pool)
 {
 	return (unlikely(get_pool_mode(pool) == PM_FAIL) ||
 		pool->pf.error_if_no_space ||
+		dm_pool_metadata_needs_check(pool->pmd) ||
 		dm_pool_is_metadata_out_of_space(pool->pmd));
 }
 
@@ -1436,8 +1437,19 @@ static enum pool_mode get_pool_mode(struct pool *pool)
 static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 {
 	int r;
+	bool out_of_space, needs_check = dm_pool_metadata_needs_check(pool->pmd);
 	enum pool_mode old_mode = pool->pf.mode;
 
+	/*
+	 * Never allow the pool to transition to PM_WRITE mode if user
+	 * intervention is required to verify metadata and data consistency.
+	 */
+	if (new_mode == PM_WRITE && old_mode != new_mode && needs_check) {
+		DMERR("%s: unable to switch pool to write mode until repaired.",
+		      dm_device_name(pool->pool_md));
+		new_mode = old_mode;
+	}
+
 	switch (new_mode) {
 	case PM_FAIL:
 		if (old_mode != new_mode)
@@ -1454,23 +1466,35 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 		if (old_mode != new_mode)
 			DMERR("%s: switching pool to read-only mode",
 			      dm_device_name(pool->pool_md));
-		r = dm_pool_abort_metadata(pool->pmd);
-		if (r) {
-			DMERR("%s: aborting transaction failed",
-			      dm_device_name(pool->pool_md));
-			new_mode = PM_FAIL;
-			set_pool_mode(pool, new_mode);
-		} else {
-			bool out_of_space;
-			if (!dm_pool_is_out_of_space(pool->pmd, &out_of_space) && out_of_space)
-				dm_pool_metadata_read_write(pool->pmd);
-			else
-				dm_pool_metadata_read_only(pool->pmd);
-			pool->process_bio = process_bio_read_only;
-			pool->process_discard = process_discard;
+		if (needs_check) {
+			r = dm_pool_abort_metadata(pool->pmd);
+			if (r) {
+				DMERR("%s: aborting transaction failed",
+				      dm_device_name(pool->pool_md));
+				new_mode = PM_FAIL;
+				set_pool_mode(pool, new_mode);
+				goto out;
+			}
+		}
+		if (!dm_pool_is_out_of_space(pool->pmd, &out_of_space) && out_of_space)
+			dm_pool_metadata_read_write(pool->pmd);
+		else
+			dm_pool_metadata_read_only(pool->pmd);
+		pool->process_bio = process_bio_read_only;
+		pool->process_discard = process_discard;
+		if (needs_check)
 			pool->process_prepared_mapping = process_prepared_mapping_fail;
-			pool->process_prepared_discard = process_prepared_discard_passdown;
+		else {
+			/*
+			 * Allow previously prepared mappings to complete (important
+			 * for proper handling of case when data space is exhausted).
+			 * If read-only mode was requested no new mappings will be
+			 * created (due to process_bio_read_only) so no risk using
+			 * process_prepared_mapping.
+			 */
+			pool->process_prepared_mapping = process_prepared_mapping;
 		}
+		pool->process_prepared_discard = process_prepared_discard_passdown;
 		break;
 
 	case PM_WRITE:
@@ -1484,7 +1508,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
 		pool->process_prepared_discard = process_prepared_discard;
 		break;
 	}
-
+out:
 	pool->pf.mode = new_mode;
 }
 
@@ -1501,15 +1525,21 @@ static void out_of_data_space(struct pool *pool)
 
 static void metadata_operation_failed(struct pool *pool, const char *op, int r)
 {
+	const char *pool_device_name = dm_device_name(pool->pool_md);
+
 	DMERR_LIMIT("%s: metadata operation '%s' failed: error = %d",
-		    dm_device_name(pool->pool_md), op, r);
+		    pool_device_name, op, r);
 
 	if (r == -ENOSPC) {
 		dm_pool_set_metadata_out_of_space(pool->pmd);
 		DMERR_LIMIT("%s: no free metadata space available.",
-			    dm_device_name(pool->pool_md));
+			    pool_device_name);
 	}
 
+	DMWARN_LIMIT("%s: consistency of metadata and data must be verified, please repair.",
+		     pool_device_name);
+	dm_pool_metadata_set_needs_check(pool->pmd);
+
 	set_pool_mode(pool, PM_READ_ONLY);
 }
 
@@ -2295,6 +2325,12 @@ static int maybe_resize_data_dev(struct dm_target *ti, bool *need_commit)
 		return -EINVAL;
 
 	} else if (data_size > sb_data_size) {
+		if (dm_pool_metadata_needs_check(pool->pmd)) {
+			DMERR("%s: unable to grow the data device until repaired.",
+			      dm_device_name(pool->pool_md));
+			return -EINVAL;;
+		}
+
 		if (sb_data_size)
 			DMINFO("%s: growing the data device from %llu to %llu blocks",
 			       dm_device_name(pool->pool_md),
@@ -2336,6 +2372,12 @@ static int maybe_resize_metadata_dev(struct dm_target *ti, bool *need_commit)
 		return -EINVAL;
 
 	} else if (metadata_dev_size > sb_metadata_dev_size) {
+		if (dm_pool_metadata_needs_check(pool->pmd)) {
+			DMERR("%s: unable to grow the metadata device until repaired.",
+			      dm_device_name(pool->pool_md));
+			return -EINVAL;;
+		}
+
 		DMINFO("%s: growing the metadata device from %llu to %llu blocks",
 		       dm_device_name(pool->pool_md),
 		       sb_metadata_dev_size, metadata_dev_size);
@@ -2865,7 +2907,7 @@ static struct target_type pool_target = {
 	.name = "thin-pool",
 	.features = DM_TARGET_SINGLETON | DM_TARGET_ALWAYS_WRITEABLE |
 		    DM_TARGET_IMMUTABLE,
-	.version = {1, 10, 0},
+	.version = {1, 11, 0},
 	.module = THIS_MODULE,
 	.ctr = pool_ctr,
 	.dtr = pool_dtr,
@@ -3155,7 +3197,7 @@ static int thin_iterate_devices(struct dm_target *ti,
 
 static struct target_type thin_target = {
 	.name = "thin",
-	.version = {1, 10, 0},
+	.version = {1, 11, 0},
 	.module	= THIS_MODULE,
 	.ctr = thin_ctr,
 	.dtr = thin_dtr,
-- 
1.8.3.1




More information about the dm-devel mailing list