[dm-devel] Re: [PATCH] 10-times slower dm-raid1 O_SYNC

Jens Axboe jens.axboe at oracle.com
Thu Apr 24 11:20:40 UTC 2008


On Mon, Apr 21 2008, Mikulas Patocka wrote:
> Hi
> 
> Here is the patch to fix 10-times slower dm-raid1.
> 
> I realized that conditional unplug when someone is waiting on any IO (that 
> I proposed on the phone call) is not correct --- it would leave the same 
> problem with asynchronous IO (io_submit functions, etc.). So when kmirrord 
> finishes all work, it unplugs the queue unconditionally.
> 
> The same bug exists with kcopyd, I fixed it to always use BIO_RW_SYNC bit. 
> In kcopyd work routine it is not possible to find all devices that were 
> involved in copying, so we don't know which queue to unplug. So the only 
> way to fix it is to use synchronous submit.
> 
> Look at it and if you don't find any issues, I will post the patch to the 
> customer for his testing.
> 
> Mikulas

> Avoid 3ms delay in dm-raid and kcopyd.
> 
> It is specified that any submitted bio without BIO_RW_SYNC flag may plug the
> queue (i.e. block the requests from being dispatched to the physical device).
> 
> The queue is unplugged when the caller calls blk_unplug() function. Usually, the
> sequence is that someone calls submit_bh to submit IO on a buffer. The IO plugs
> the queue and waits (to be possibly joined with other adjacent bios). Then, when
> the caller calls wait_on_buffer(), it unplugs the queue and submits the IOs to
> the disk.
> 
> This was happenning:
> 
> When doing O_SYNC writes, function fsync_buffers_list() submits a list of
> bios to dm_raid1, the bios are added to dm_raid1 write queue and kmirrord is
> woken up.
> 
> fsync_buffers_list() calls wait_on_buffer() --- that unplugs the queue, but
> there are no bios on the device queue (they are still in dm_raid1 queue)
> 
> wait_on_buffer() starts waiting until the IO is finished.
> 
> kmirrord is scheduled, kmirrord takes bios and submits them to the devices.
> 
> the submitted bio plugs the harddisk queue, there is no one to unplug it
> (the process that called wait_on_buffer() is already sleeping)
> 
> So there is 3ms timeout, after which the queues on the harddisks are
> unplugged, and requests are processed.
> 
> This 3ms timeout caused that in certain workloads (O_SYNC, 8kb writes), dm-raid1
> is 10 times slower than md-raid1.
> 
> 
> Every time we submit something asynchronously via dm_io, we must unplug the
> queue to actually put the request to the device.
> 
> This patch adds unplug call to kmirrord --- while processing requests, it keeps
> the queue plugged (so that adjacent bios can be merged), when it finishes
> processing all bios, it unplugs the queue to submit the bios.
> 
> It also fixes kcopyd, that has the same potential problem. All kcopyd requests
> are submitted with BIO_RW_SYNC.
> 
> Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
> 
> ---
>  drivers/md/dm-io.c     |   10 ++++++++--
>  drivers/md/dm-kcopyd.c |    2 +-
>  drivers/md/dm-raid1.c  |    2 ++
>  3 files changed, 11 insertions(+), 3 deletions(-)
> 
> Index: linux-2.6.25-rc8/drivers/md/dm-raid1.c
> ===================================================================
> --- linux-2.6.25-rc8.orig/drivers/md/dm-raid1.c	2008-04-03 18:45:09.000000000 +0200
> +++ linux-2.6.25-rc8/drivers/md/dm-raid1.c	2008-04-21 21:55:34.000000000 +0200
> @@ -1297,6 +1297,8 @@ static void do_mirror(struct work_struct
>  	do_reads(ms, &reads);
>  	do_writes(ms, &writes);
>  	do_failures(ms, &failures);
> +
> +	dm_table_unplug_all(ms->ti->table);
>  }
>  
>  
> Index: linux-2.6.25-rc8/drivers/md/dm-io.c
> ===================================================================
> --- linux-2.6.25-rc8.orig/drivers/md/dm-io.c	2008-04-21 22:32:21.000000000 +0200
> +++ linux-2.6.25-rc8/drivers/md/dm-io.c	2008-04-21 22:43:14.000000000 +0200
> @@ -353,7 +353,7 @@ static int sync_io(struct dm_io_client *
>  {
>  	struct io io;
>  
> -	if (num_regions > 1 && rw != WRITE) {
> +	if (num_regions > 1 && (rw & RW_MASK) != WRITE) {
>  		WARN_ON(1);
>  		return -EIO;
>  	}
> @@ -390,7 +390,7 @@ static int async_io(struct dm_io_client 
>  {
>  	struct io *io;
>  
> -	if (num_regions > 1 && rw != WRITE) {
> +	if (num_regions > 1 && (rw & RW_MASK) != WRITE) {
>  		WARN_ON(1);
>  		fn(1, context);
>  		return -EIO;
> @@ -437,6 +437,12 @@ static int dp_init(struct dm_io_request 
>  
>  /*
>   * New collapsed (a)synchronous interface
> + *
> + * Attention:
> + * If the IO is asynchronous (i.e. it has notify.fn), you must either unplug the
> + * queue with blk_unplug() some times later or you must set BIO_RW_SYNC bit in
> + * io_req->bi_rw. If you fail to do one of these, the IO will be submitted to
> + * the disk with 3ms delay.
>   */
>  int dm_io(struct dm_io_request *io_req, unsigned num_regions,
>  	  struct dm_io_region *where, unsigned long *sync_error_bits)
> Index: linux-2.6.25-rc8/drivers/md/dm-kcopyd.c
> ===================================================================
> --- linux-2.6.25-rc8.orig/drivers/md/dm-kcopyd.c	2008-04-03 18:45:08.000000000 +0200
> +++ linux-2.6.25-rc8/drivers/md/dm-kcopyd.c	2008-04-21 22:34:41.000000000 +0200
> @@ -333,7 +333,7 @@ static int run_io_job(struct kcopyd_job 
>  {
>  	int r;
>  	struct dm_io_request io_req = {
> -		.bi_rw = job->rw,
> +		.bi_rw = job->rw | (1 << BIO_RW_SYNC),
>  		.mem.type = DM_IO_PAGE_LIST,
>  		.mem.ptr.pl = job->pages,
>  		.mem.offset = job->offset,

Looks ok to me.

Acked-by: Jens Axboe <jens.axboe at oracle.com>

-- 
Jens Axboe




More information about the dm-devel mailing list