[dm-devel] [PATCH 3/3] dm-thin: fix discard_granularity

Vivek Goyal vgoyal at redhat.com
Thu Jul 19 15:44:30 UTC 2012


On Wed, Jul 18, 2012 at 03:10:21PM -0400, Mikulas Patocka wrote:
> 
> 
> On Tue, 17 Jul 2012, Vivek Goyal wrote:
> 
> > On Tue, Jul 17, 2012 at 03:35:04PM -0400, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Tue, 17 Jul 2012, Mike Snitzer wrote:
> > > 
> > > > On Mon, Jul 16 2012 at  2:35pm -0400,
> > > > Mikulas Patocka <mpatocka at redhat.com> wrote:
> > > > 
> > > > > dm-thin: fix discard_granularity
> > > > > 
> > > > > The kernel expects that limits->discard_granularity is a power of two.
> > > > > Set this limit only if we use a power of two block size.
> > > > > 
> > > > > Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
> > > > > 
> > > > > ---
> > > > >  drivers/md/dm-thin.c |    3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > 
> > > > > Index: linux-3.5-rc6-fast/drivers/md/dm-thin.c
> > > > > ===================================================================
> > > > > --- linux-3.5-rc6-fast.orig/drivers/md/dm-thin.c	2012-07-16 20:07:49.000000000 +0200
> > > > > +++ linux-3.5-rc6-fast/drivers/md/dm-thin.c	2012-07-16 20:08:01.000000000 +0200
> > > > > @@ -2502,7 +2502,8 @@ static void set_discard_limits(struct po
> > > > >  	 * bios cover a block partially.  A discard that spans a block boundary
> > > > >  	 * is not sent to this target.
> > > > >  	 */
> > > > > -	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> > > > > +	if (pool->sectors_per_block_shift >= 0)
> > > > > +		limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> > > > >  	limits->discard_zeroes_data = pool->pf.zero_new_blocks;
> > > > >  }
> > > > 
> > > > Given the block layer's assumption that discard_granularity is always a
> > > > power of 2: thinp should disable discard if the thinp blocksize is a non
> > > > power of 2.  So this patch isn't correct (discard support should be
> > > > disabled in pool_ctr based on the specified blocksize).
> > > 
> > > discard_granularity is just a hint (and IMHO quite useless hint).
> > > 
> > > The documentation says that it indicates a size of internal allocation 
> > > unit that may be larger than the block size. The code doesn't use it this 
> > > way - it is used in FITRIM ioctl where it specifies the minimum request 
> > > size to be sent. It is also used in blkdev_issue_discard where it is used 
> > > to round down the number of sectors to discard on discard_granularity 
> > > boundary - this is wrong, it aligns request size on discard_granularity 
> > > boundary, but it doesn't align request start on this boundary.
> > 
> > I am not sure I understand completely what you are trying to say.
> 
> I mean that it is used inconsistently

We can always fix inconsistent use.

> - sometimes it is used as s minimum 
> request to be sent (requests smaller than discard_granularity are not 
> sent). And sometimes length is rounded down to discard_granularity 
> boundary.

Shouldn't it be used as both. If you use it only to determine the minimum
request size, how do you come up with alignment for your next request?

I thought these are hints so you try your best and then leave it to
physical device/driver to determine how to deal with it now.

> 
> > But
> > after paolo's patch, blkdev_issue_discard() will take into account
> > max_discard_sectors to limit max discard request size and use
> > discard_granularity and discard_alignment to determine aligned request start.
> 
> The question is - how are we supposed to propagate these parameters 
> through linearly appended devices, raid0, raid1 or other mappings?

We already deal with it in stacking limits. (blk_stack_limits()). Now
one can question whether that's the most optimal way to do things or
not.

> 
> For example, if you have a logical volume that consists of two linearly 
> appended disks, disk1 with discard_granularity1,discard_alignment1 and 
> disk2 with discard_granularity2,discard_alignment2 - tell me, how do you 
> calculate discard_granularity and discard_alignment for the combined 
> logical device from these four numbers? How do you calculate it if those 
> two disks are in raid0 or raid1?

I am looking at current logic in blk_stack_limits().

- For discard granularity, it just takes maximum of granularity1 and
  granularity2. So as long as one granularity is multiple of other
  granularity, things are just fine.

- For discard_alignment we seem to be taking lcm() of both the values but
  I can't wrap my head around it that why does that work.

- For max_discard_sectors, we take minimum of two devices.

So discard_alignment seems to be only odd piece w.r.t stacking.

> 
> It seems to me that would be much better to state that discard request 
> size is unlimited and break one long discard to several smaller discard 
> requests at the physical disk driver - then, the problem with combining 
> the limits would go away.

May be. But if hints can be propagated and bio's don't have be broken
down, then it should make life simpler for drivers writers.

Thanks
Vivek




More information about the dm-devel mailing list