[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] [RFC] gfs2: Add sb and rgrp fields to aid fsck and grow



On 29/01/14 15:32, Steven Whitehouse wrote:
Hi,

On Wed, 2014-01-29 at 14:47 +0000, Andrew Price wrote:
This adds some fields to the superblock and resource group header
structures that we can use in rg size and address discovery in gfs2_grow
and fsck.gfs2. They are not intended to be changed after mkfs time.

sb_rgsize is the base resource group size used by mkfs.gfs2, before any
adjustment or alignment. It is required in order to extend the fs with
the correct resource group size in gfs2_grow and can also be used by
fsck.gfs2 when rebuilding broken resource groups.

I still don't see the point of adding this, really. We can calculate a
sensible size and use that for extending the rgrps.

I'm not really sure what you mean by a sensible size. Ideally we should be able to know or predict the actual rgrp sizes, in order to reuse the code which builds resource groups in mkfs in other rgrp appending, discovery and fixing code. Having the original value at our disposal would allow us to do that and take some guesswork out of fsck.gfs2. I think users would appreciate the consistency between the arguments they gave to mkfs.gfs2 and the values gfs2_grow uses, too.

It might be worth
considering a suitable interface to ask the kernel where the existing
rgrps are (all of them!) from userland while the fs is mounted though.
If the fs is not mounted, then the information can be easily gathered by
looking at the existing rgrp layout.

That might be fine for gfs2_grow's purposes, but if the fs has a corrupted rindex then it will still be difficult for fsck.gfs2 to get the information reliably.

rg_next is the address of the next resource group and is set by
mkfs.gfs2. It is intended to be used as a hint to fsck.gfs2 and can be
used by other tools which need to read the resource groups sequentially.

It needs to be set elsewhere too - there is no reason that we cannot
upgrade older fs by adding this info each time we write an rgrp header
that does not already have this info in it.

Yes, that makes sense, it'd be set by gfs2_convert also.

Also, we could use the 32
bit field rather than a 64 bit one, since the max size of the rgrp is 32
bits I think? Or is there some corner case that we need to take care of
perhaps?

Well I had intended it would be an absolute fs block address but if we use an offset then we have to keep in mind that there will be an alignment gap after the end of a rgrp in many cases. I think we'd have to find a storage array with pretty gigantic stripes to exhaust that address space though.

rg_uuid is intended to be the same as sb_uuid for the file system. It
can be used by fsck.gfs2, when searching for resource group headers, in
order to distinguish resource groups created as part of a previous file
system on the device from resource groups in the current file system.

Again, this could be updated by writing the rgrps back to deal with
older filesystems which need to be upgraded.

Yes.

That could be done as a one
off sweep, or as and when we write each rgrp. Also I wonder - if the
field is zero, we know that the rgrp is an old one that doesn't have it
set, but if someone changes the uuid at a later date, then what?

That's a good point.

Maybe
we can use the uuid as a way to set it to start with (in mkfs), but
after that we'd use the value from the first rgrp to fill in later
rgrps. If the first rgrp was zero then we'd not update the other rgrps
until the first rgrp had a value in it. Or something like that... I just
want to be certain that we understand what this field will mean in all
possible cases,

Yes, unless you'd prefer a separate sb_rg_uuid field in the superblock we should treat it differently to the sb_uuid after mkfs and only expect the rg_uuids to be consistent with themselves. There's a corner case where the first rgrp might have its uuid while the others still have zero, though, which will need some more though.

Andy


Steve.

Signed-off-by: Andrew Price <anprice redhat com>
---
  include/uapi/linux/gfs2_ondisk.h | 8 ++++++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/gfs2_ondisk.h b/include/uapi/linux/gfs2_ondisk.h
index 0f24c07..f1489cb 100644
--- a/include/uapi/linux/gfs2_ondisk.h
+++ b/include/uapi/linux/gfs2_ondisk.h
@@ -118,7 +118,8 @@ struct gfs2_sb {

  	__be32 sb_bsize;
  	__be32 sb_bsize_shift;
-	__u32 __pad1;	/* Was journal segment size in gfs1 */
+	__be32 sb_rgsize; /* Resource group size used on fs creation.
+	                     Was journal segment size in gfs1 */

  	struct gfs2_inum sb_master_dir; /* Was jindex dinode in gfs1 */
  	struct gfs2_inum __pad2; /* Was rindex dinode in gfs1 */
@@ -131,6 +132,7 @@ struct gfs2_sb {
  	struct gfs2_inum __pad4; /* Was licence inode in gfs1 */
  #define GFS2_HAS_UUID 1
  	__u8 sb_uuid[16]; /* The UUID, maybe 0 for backwards compat */
+
  };

  /*
@@ -188,8 +190,10 @@ struct gfs2_rgrp {
  	__be32 rg_dinodes;
  	__be32 __pad;
  	__be64 rg_igeneration;
+	__be64 rg_next; /* Address of the next resource group */
+	__u8 rg_uuid[16]; /* The UUID, maybe 0 for backwards compat */

-	__u8 rg_reserved[80]; /* Several fields from gfs1 now reserved */
+	__u8 rg_reserved[64]; /* Several fields from gfs1 now reserved */
  };

  /*




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]