[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [dm-devel] [PATCH v3 0/8] dm-raid (raid456) target

On Jan 6, 2011, at 9:56 AM, Phillip Susi wrote:

On 1/6/2011 5:46 AM, NeilBrown wrote:
3:	<#raid_devs> <meta_dev1> <dev1> .. <meta_devN> <devN>

Let me get this straight.  You specify a separate device to hold the
metadata and write intent bitmap for each data device? So for a 3 disk
raid 5, lvm will need to create two logical volumes on each of the 3
physical volumes, one of which will only be a single physical extent,
and will hold the raid metadata and write intent bitmap?

Why not just store the metadata on the main device like mdadm does today?

There is no single big reason to do things as I've propose, just a lot of little reasons...

1) Device-mapper already has a few cases where metadata is kept on separate devices from the data (snapshots and mirror log) and no cases where they are kept together. This new raid module is similar to the mirroring case, where bitmaps are kept separately.

2) It seems a bit funny to specify a length (second param of the device-mapper CTR) and then expect the devices to be larger than their share of that amount to accommodate metadata. You might say it is funny to have to specify a separate device to hold the metadata, but I would again give the mirror log as an example.

3) Where multiple physical devices form a single leg/component of the array, the argument for having a metadata device specifically tied to its data device as an indivisible unit is weakened.

4) Having the metadata on a separate logical device increases the flexibility of its placement. You could have it at the beginning, in the middle, or at the end. (The middle might actually be preferred for performance reasons.) There are no offset calculations to perform in the kernel that depend on metadata placement.

5) Resizing an array might require the resizing of the metadata area. Because the devices are separate, there is no need to move around data or metadata to accommodate this. If they were mixed in the same device and the metadata was at the beginning, that's a problem if the metadata no longer fits in its area. Likewise, if the metadata were at the end of a mixed device, you would have to move it when growing. These problems are eliminated.

6) The metadata areas are not necessary in every case. Some raid controllers handle the metadata on their own (dm-raid works with these). You might say it is merely another flag on the CTR line to indicate whether to use metadata or not. Perhaps, but having them separate means you can easily convert between the two types.

7) Clustering? Perhaps one of the weaker arguments, but having the metadata separate allows it to easily grow to accommodate a bitmap / device / node, for example. This is really the same argument as easily being able to reform/resize the metadata area.

8) Bitmaps/superblocks that are updated often could be placed on separate devices, like SSDs, while the data is on spinning media. I'm not necessarily advocating this, but if someone wants to do it, I think they should be able to.

9) Flexibility for the future. Imagine a mirror and you'd like to split off a leg - the data portion alone becomes the linear device. The metadata device could be discarded, or it could be recombined with the data device and reinserted into the array - having just the deltas be played back from the original mirror that has remained actively in- use.

Each of these reasons is not all that compelling in isolation; but together, I think they make a pretty good case. There is additional flexibility here; and this is to be sacrificed for what? A simpler CTR line? I don't know of anyone who enters these by hand without instead using LVM, dm-raid, multipath, etc. MD does it this way? Well, this is device-mapper and it has its own idiosyncrasies and precedents.

Also, I understand what you mean by your final question, but for those who are new to this I'd like to point out that we /are/ storing the metadata on the main physical device, but not the same logical device. [Again, this will be the rule, but is flexible.]


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]