[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] [GFS2 Patch] pass formal ino in do_filldir_main

Please note, I haven't been following current GFS2 development...

On Feb 26, 2007, at 9:45 AM, Steven Whitehouse wrote:


On Fri, 2007-02-23 at 22:56 -0500, Wendy Cheng wrote:

I gave this issue some more thoughts - would like to suggest we take
this patch (at least for now) since it aligns with current code base.

The no_formal_ino is apparently intented to get returned back to user
space due to its unique-ness (and we have to trust pick_formal_ino()
does its job right). With normal readdir system call, after the inode
number is sent to user space, there is no route (I've checked) for it to
come back to kernel. So the only user that would use these filldir ino
inside the kernel is NFSD. NFSD calls gfs2_ilookup() that requires

no_formal_ino is not ever guaranteed to be unique which is why I want to
keep it away from userspace. I don't really want to have to change that
in order to fix NFS.

Why is it not unique? (from Ken's doc on GFS2: "If the filesystem allocated 1 billion inodes per second, it would take over 500 years for roll-over.")

As you say there is no route for the inode number returned via readdir
to come back to the kernel, but it should match the inode number
returned by stat and I don't see that we should change either of them
just to get NFS to work when userspace is perfectly ok as it is.

There were alot of things done to make NFS work properly with GFS - many of them through great pain, because GFS is a clustered FS.

If you look further... The current lookup code actually uses
no_formal_ino, not no_addr. The two main "gate" routines that controls
ino-to-inode conversion are:

* gfs2_ilookup() (used by NFS route)
* gfs2_inode_lookup() (used by VFS that calls gfs2_lookup()).

Neither of them need to use no_formal_ino at all. The lookup is using
no_addr as there is no other index which makes sense. I know that
no_formal_ino is used as part of the hash, but it shouldn't be really
and its trivial to change that. You can't swap to just using
no_formal_ino on its own as there is no mapping to find its on-disk
location from no_formal_ino.

Both use no_formal_ino - gfs2_inode_lookup() logic hides this behind the
little wrapper "gfs2_iget()". Since current VFS side's lookup has been
working fine, this no_formal_ino idea apparently is working. So let's
make NFSD side work the *same* way. I'm convinced this patch does a
right thing.

I don't dispute using generation number may not be a bad idea and may
perform better. However, if we really look into the details, it is not
easy for current code structure. Since we have something already
working, it is not wise to mess this code layout at this moment.

Make sense ?

-- Wendy

I'm afraid that I'm still not convinced on that. It seems that ext3 uses
the method that I'm proposing in that it looks up the inode by inode
number and then only afterwards checks the generation number. It also
has the advantage of separating the NFS interface from the rest of the
filesystem. That seems to me to be a lot of the problem that we are
looking at here in that the VFS's lookup function wants to look up by
no_addr and (currently) NFS is asking for something slightly different,

How do you plan on handling inode migration when using NFS? I believe that was one of the central reasons for having no_formal_inode. There may be other reasons as well, as I am not familiar with the code.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]