[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[dm-devel] [RFC][PATCH] Fix BIO reordering when resuming devices



Hi!

We should not never reorder BIOs (at least not around barriers),
as Jens Axboe affirmed. This will cause problems as soon as there
will be BIO users that set BIO_RW_BARRIER.

This is an attempt to fix this problem in the device-mapper core.

Currently this can happen while the device is suspended since
md->deferred implements a stack instead of a queue.

Also we must block incoming requests while the queue is flushed
because this could also lead to a reordering.

I thought of two solutions:
1. While the queue is being flushed new incoming requests are added
   to the tail of the queue.
2. When the queue is being flushed just block dm_request until flushing
   is finished.

The first has a problem: Under heavy IO we might never leave dm_resume
because under heavy IO dm_request keeps flooding the queue.

So I implemented the second.

The queue gets flushed while md->lock is held (for writing) so
dm_request blocks. Because we then cannot requeue the deferred requests
via generic_make_request (would deadlock in dm_request when trying to
acquire the read lock) it directly starts processing of the request.

Therefore the last few lines of dm_request are moved into a
__dm_request.

BTW: There was a up_read(&md->lock); missing in dm_request in the
(!dm->map) error case.


diff -Nur linux.orig/drivers/md/dm.c linux/drivers/md/dm.c
--- linux.orig/drivers/md/dm.c	2004-01-01 21:49:16.118235408 +0100
+++ linux/drivers/md/dm.c	2004-01-01 21:54:35.537676272 +0100
@@ -63,7 +63,8 @@
 	 */
 	atomic_t pending;
 	wait_queue_head_t wait;
-	struct bio *deferred;
+	struct bio *deferred_head;
+	struct bio *deferred_tail;
 
 	/*
 	 * The current mapping.
@@ -231,8 +232,13 @@
 		return 1;
 	}
 
-	bio->bi_next = md->deferred;
-	md->deferred = bio;
+	bio->bi_next = NULL;
+
+	if (md->deferred_tail)
+		md->deferred_tail->bi_next = bio;
+	else
+		md->deferred_head = bio;
+	md->deferred_tail = bio;
 
 	up_write(&md->lock);
 	return 0;		/* deferred successfully */
@@ -512,6 +518,16 @@
  *---------------------------------------------------------------*/
 
 
+static inline void __dm_request(struct mapped_device *md, struct bio *bio)
+{
+		if (!md->map) {
+			bio_io_error(bio, bio->bi_size);
+			return;
+		}
+
+		__split_bio(md, bio);
+}
+
 /*
  * The request function that just remaps the bio built up by
  * dm_merge_bvec.
@@ -550,12 +566,7 @@
 		down_read(&md->lock);
 	}
 
-	if (!md->map) {
-		bio_io_error(bio, bio->bi_size);
-		return 0;
-	}
-
-	__split_bio(md, bio);
+	__dm_request(md, bio);
 	up_read(&md->lock);
 	return 0;
 }
@@ -788,16 +799,16 @@
 }
 
 /*
- * Requeue the deferred bios by calling generic_make_request.
+ * Process the deferred bios
  */
-static void flush_deferred_io(struct bio *c)
+static void flush_deferred_io(struct mapped_device *md, struct bio *c)
 {
 	struct bio *n;
 
 	while (c) {
 		n = c->bi_next;
 		c->bi_next = NULL;
-		generic_make_request(c);
+		__dm_request(md, c);
 		c = n;
 	}
 }
@@ -891,12 +902,14 @@
 
 	dm_table_resume_targets(md->map);
 	clear_bit(DMF_SUSPENDED, &md->flags);
+	def = md->deferred_head;
+	md->deferred_head = md->deferred_tail = NULL;
+
+	flush_deferred_io(md, def);
+
 	clear_bit(DMF_BLOCK_IO, &md->flags);
-	def = md->deferred;
-	md->deferred = NULL;
 	up_write(&md->lock);
 
-	flush_deferred_io(def);
 	blk_run_queues();
 
 	return 0;




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]