[dm-devel] fragmented i/o with 2.6.31?

Jun'ichi Nomura j-nomura at ce.jp.nec.com
Thu Sep 17 09:14:22 UTC 2009


Hi Mike, Alasdair,

Kiyoshi Ueda wrote:
> On 09/17/2009 01:22 AM +0900, David Strand wrote:
>> On Wed, Sep 16, 2009 at 8:34 AM, David Strand <dpstrand at gmail.com> wrote:
>>> I am issuing 512 Kbyte reads through the device mapper device node to
>>> a fibre channel disk. With 2.6.30 one read command for the entire 512
>>> Kbyte length is placed on the wire. With 2.6.31 this is being broken
>>> up into 5 smaller read commands placed on the wire, decreasing
>>> performance.
>>>
>>> This is especially penalizing on some disks where we have prefetch
>>> turned off via the scsi mode page. Is there any easy way (through
>>> configuration or sysfs) to restore the single read per i/o behavior
>>> that I used to get?
>>
>> I should note that I am using dm-mpath, and the i/o is fragmented on
>> the wire when using the device mapper device node but it is not
>> fragmented when using one of the regular /dev/sd* device nodes for
>> that device.
> 
> David,
> Thank you for reporting this.
> I found on my test machine that max_sectors is set to SAFE_MAX_SECTORS,
> which limits the I/O size small.
> The attached patch fixes it.  I guess the patch (and increasing
> read-ahead size in /sys/block/dm-<n>/queue/read_ahead_kb) will solve
> your fragmentation issue.  Please try it.
> 
> 
> Mike, Alasdair,
> I found that max_sectors and max_hw_sectors of dm device are set
> in smaller values than those of underlying devices.  E.g:
>     # cat /sys/block/sdj/queue/max_sectors_kb
>     512
>     # cat /sys/block/sdj/queue/max_hw_sectors_kb
>     32767
>     # echo "0 10 linear /dev/sdj 0" | dmsetup create test
>     # cat /sys/block/dm-0/queue/max_sectors_kb
>     127
>     # cat /sys/block/dm-0/queue/max_hw_sectors_kb
>     127
> This prevents the I/O size of struct request from becoming enough big
> size, and causes undesired request fragmentation in request-based dm.
> 
> This should be caused by the queue_limits stacking.
> In dm_calculate_queue_limits(), the block-layer's small default size
> is included in the merging process of target's queue_limits.
> So underlying queue_limits is not propagated correctly.
> 
> I think initializing default values of all max_* in '0' is an easy fix.
> Do you think my patch is acceptable?
> Any other idea to fix this problem?

Well, sorry, we jumped the gun..
The patch should work fine for dm-multipath but
setting '0' by default will cause problems on targets like 'zero' and
'error', which take no underlying device and use the default value.

>  	blk_set_default_limits(limits);
> +	limits->max_sectors = 0;
> +	limits->max_hw_sectors = 0;

So this should either set something very big (e.g. UINT_MAX)
or set 0 by default but change to a certain safe value, if the end
result of merging the limits is still 0.

Attached is a revised patch with the latter approach.
Please check this.
If the approach is fine, I think we should bring this up to Jens
whether to have these helpers in dm-table.c or move to block/blk-settings.c.

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

max_sectors and max_hw_sectors of dm device are set to smaller values
than those of underlying devices.  E.g:
    # cat /sys/block/sdj/queue/max_sectors_kb
    512
    # cat /sys/block/sdj/queue/max_hw_sectors_kb
    32767
    # echo "0 10 linear /dev/sdj 0" | dmsetup create test
    # cat /sys/block/dm-0/queue/max_sectors_kb
    127
    # cat /sys/block/dm-0/queue/max_hw_sectors_kb
    127
This prevents the I/O size of struct request from becoming large,
and causes undesired request fragmentation in request-based dm.

This is caused by the queue_limits stacking.
In dm_calculate_queue_limits(), the block-layer's safe default value
(SAFE_MAX_SECTORS, 255) is included in the merging process of target's
queue_limits. So underlying queue_limits is not propagated correctly.

Initialize default values of all max_*sectors to '0'
and change the limits to SAFE_MAX_SECTORS only if the value is still
'0' after merging.

Signed-off-by: Kiyoshi Ueda <k-ueda at ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura at ce.jp.nec.com>
Cc: David Strand <dpstrand at gmail.com>
Cc: Mike Snitzer <snitzer at redhat.com>,
Cc: Alasdair G Kergon <agk at redhat.com>
---
 drivers/md/dm-table.c |   30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

Index: linux-2.6.31/drivers/md/dm-table.c
===================================================================
--- linux-2.6.31.orig/drivers/md/dm-table.c
+++ linux-2.6.31/drivers/md/dm-table.c
@@ -647,6 +647,28 @@ int dm_split_args(int *argc, char ***arg
 }
 
 /*
+ * blk_stack_limits() chooses min_not_zero max_sectors value of underlying
+ * devices. So set the default to 0.
+ * Otherwise, the default SAFE_MAX_SECTORS dominates even if all underlying
+ * devices have max_sectors values larger than that.
+ */
+static void _set_default_limits_for_stacking(struct queue_limits *limits)
+{
+	blk_set_default_limits(limits);
+	limits->max_sectors = 0;
+	limits->max_hw_sectors = 0;
+}
+
+/* If there's no underlying device, use the default value in blockdev. */
+static void _adjust_limits_for_stacking(struct queue_limits *limits)
+{
+	if (limits->max_sectors == 0)
+		limits->max_sectors = SAFE_MAX_SECTORS;
+	if (limits->max_hw_sectors == 0)
+		limits->max_hw_sectors = SAFE_MAX_SECTORS;
+}
+
+/*
  * Impose necessary and sufficient conditions on a devices's table such
  * that any incoming bio which respects its logical_block_size can be
  * processed successfully.  If it falls across the boundary between
@@ -684,7 +706,7 @@ static int validate_hardware_logical_blo
 	while (i < dm_table_get_num_targets(table)) {
 		ti = dm_table_get_target(table, i++);
 
-		blk_set_default_limits(&ti_limits);
+		_set_default_limits_for_stacking(&ti_limits);
 
 		/* combine all target devices' limits */
 		if (ti->type->iterate_devices)
@@ -707,6 +729,8 @@ static int validate_hardware_logical_blo
 		    device_logical_block_size_sects - next_target_start : 0;
 	}
 
+	_adjust_limits_for_stacking(limits);
+
 	if (remaining) {
 		DMWARN("%s: table line %u (start sect %llu len %llu) "
 		       "not aligned to h/w logical block size %u",
@@ -991,10 +1015,10 @@ int dm_calculate_queue_limits(struct dm_
 	struct queue_limits ti_limits;
 	unsigned i = 0;
 
-	blk_set_default_limits(limits);
+	_set_default_limits_for_stacking(limits);
 
 	while (i < dm_table_get_num_targets(table)) {
-		blk_set_default_limits(&ti_limits);
+		_set_default_limits_for_stacking(&ti_limits);
 
 		ti = dm_table_get_target(table, i++);
 




More information about the dm-devel mailing list