[Cluster-devel] [GFS2 Patch] pass formal ino in do_filldir_main

Steven Whitehouse swhiteho at redhat.com
Thu Feb 8 09:42:38 UTC 2007


Hi,

On Wed, 2007-02-07 at 17:35 -0500, Wendy Cheng wrote:
> On Wed, 2007-02-07 at 10:49 +0000, Steven Whitehouse wrote:
> > Hi,
> > 
> > I'm not sure that this is the right fix. no_formal_ino is used only as a
> > hash value in the lookup, the "real" lookup is always via no_addr. So we
> > can alter that hash if required to just use no_addr, but I'd rather not
> > have to return no_formal_ino here since we have no way to map that back
> > to the disk location of the inode in general. In fact given a maximum
> > sized GFS2 filesystem and enough nodes to fill it with inodes in a
> > reasonable time (yes, I know thats a lot and a long time!), its not even
> > certain to be unique,
> > 
> 
> NFS won't run well without this patch - I originally included this into
> an "if command == nfsd" clause. Later checking into Ken's description
> (in cluster-list at redhat.com) about this issue, I think his intention
> here was that any ino returned to user space should use no_formal_ino.
> And this is exactly what the filldir should be doing (by returning
> no_formal_ino back to caller). 
> 
I'd rather not use no_formal_ino at all if possible. It is not, in
general, unique for any fs which has seen a lot of activity and there is
also no way to look up an inode from its no_formal_ino alone without
potentially searching the entire fs. To my mind an inode number should
be guaranteed to be unique for the lifetime of the inode and also be
useable as a reference by which the inode can be looked up (whether
thats directly by making it identical to the disk block number or
indirectly by using a lookup table, for example), no_formal_ino doesn't
appear to satisfy either condition where as no_addr does.

If there was an index to map no_formal_ino to a disk block, then I'd
agree, but I'd much prefer that we use no_addr in the mean time. I
suspect that all you need to do is to alter gfs2_ilookup() and
gfs2_iget() to use no_addr rather than no_formal_ino as the hash value.

Since that hash is only used internally, there should be no problem in
altering it and I think that will result in the same thing.

What I've not been able to work out is how your scheme can work in the
case that the inode is not already in cache, since filldir doesn't then
appear to get passed the no_addr at all.

> Another option is temporarily disabling this feature and make this works
> as GFS1 (with these two inode numbers identical) ? 
> 
> I like the current patch though. 
> 
> -- Wendy
Well one other option is to use the inode version number that was added recently
to each inode. The idea of no_formal_ino is that its not really an inode
number at all in the traditional sense, but just a way to ensure that an
inode isn't being accessed from a stale filehandle.

I added the inode version number with the intention that we'd be able to
use it instead of no_formal_ino, eventually, with NFS. One reason is
that because the max sized filesystem is 2^64 blocks and the
no_formal_ino is also 64 bits in size its quite likely to correlate with
the size of the fs and thus generate identical numbers once its wrapped
for similar disk offsets. Thats also true for any smaller fs which is a
power of two in size I think, but with a large number of rewrites of the
fs required in each case.

With the version number scheme, the version numbers are unique to a
particular resource group so we are getting 2^64/2^32 (for a max sized
resource group) = 2^32 (average) reallocations of each block within the
rgrp as an inode before the counter will wrap,

Steve.





More information about the Cluster-devel mailing list