[dm-devel] dm-thin - issue about the maximum size of the metadata device

Tue Aug 6 10:39:35 UTC 2013

Dne 6.8.2013 06:18, 梁文彥 napsal(a):
> Hi folks,
>
> I currently do some experiments on the dm-thin-provisioning targets.
> One of these experiments is trying to create/find-out the largest thin volumn
> on the pool.

There is currently not yet lvm2 support for this operation.
It needs to parse thin_dump output and do an accounting of each block
(which is not a cheap operation).
It's also hard to say how to account shared blocks between multiple volumes.
So there could be information about total used blocks, and how many blocks
would be free, if this thin volume would be removed.

The only displayed info now is the highest mapped block for thin_volume,
which just gives you very 'light' info about usage of such volume.

> As I Know, each time we provision blocks from the pool, the metadata is
> comsumed for recording the mapping information.
> By executing the lvdisplay command, we can observe the status of the pool &
> metadata usage, such as
> $ sudo lvdisplay | grep Allocated
>    Allocated pool data    7.87%
>    Allocated metadata    6.09%
>
> And the following content is extracted from thin-provisioning.txt from
> Documentation\device-mapper in the source tree.
> "As a guide, we suggest you calculate the number of bytes to use in the
> metadata device as 48 * $data_dev_size / $data_block_size but round it up
> to 2MB if the answer is smaller."
>
> If the size of the metadata dev was fixed as 16G, and the block size of the
> pool dev was set as 64K,
> then we may infer that the largest volumn size of the thin is 21.33TB.
>
> (48 * $data_dev_size / 64K = 16G
>   $data_dev_size = 16G * 64K / 48 = 21.33TB)
>
> If this inference was not correct, please kindly let me know why.

The formula there is just some approximation.
Newest device-mapper-persistent-data  have even better improved formula
to calculate also with number of thin volumes inside pool.

Anyway current limit of kernel mda  size of 16G - it may be
used for addressing variable sized thin pools.
(So its not a problem to use i.e. 1EB thin pool size - just
in this case you should probably use much bigger chunk size then 64K)

> Then I do the experiment with the following steps:
> 1. create a thin-pool with size 21.33T on my RAID0, say, the largest size we
> infered, and block size 64K, metadata size 16G
> 2. create a thin volumn with virtual size 21.33T.
> 3. dd data(/dev/uramdom) to the thin device
>
> Finally, I observed that
>    Allocated pool data    100.00%
>    Allocated metadata     71.89%
>
> It seemed that the pool data had already out of usage, but the metadata was not.
> Did it means that, metadata 16G can be applied to record a thin dev with size
> bigger than 21.73T?

Well it always depends what do you want to do - if you will use it for 
snapshot you will likely run out-of-space.

If you use it only for provisioning - you may use much bigger chunksize
and significantly reduce metadata usage and improve performance.

> 1. create a thin-pool on my RAID0 with size 36.20T, and block size 64K,
> metadata size 16G

Using blocksize/chunksize(in lvm2 terminology) 64K for  36T device is just
not a good idea.

> In this experiment, we run out of the metadata, and by the "Allocated pool
> data" field, we infered that the maximum thin device was about 30.59TB, was it
> correct?

The major rule here is to observe how full metadata are - and they are getting
too full - remove volumes.

As said above you need to plan the use-case - 64K is good for lots of 
snapshot, bigger chunk sizes are good for space provisioning (i.e. 1MB).

BTW - did you really wrote 36TB of data - how log was this test  running ?

Zdenek