[Linux-cluster] NFS on GFS architectural issues / problems

Riaan van Niekerk riaan at obsidian.co.za
Mon Aug 21 07:42:02 UTC 2006


hi Bob and others

I found on the Red Hat 108 Developer Portal the following GFS1/GFS2 
design document which details amongst others, some of the issues with 
NFS on GFS:
https://rpeterso.108.redhat.com/servlets/ProjectDocumentView?documentID=99

(I see it was sent to this list over a year ago, but I never found it 
while searching through the archives. it has a lot of good information 
in it)

It has a disclaimer: Some of the comments
are no longer applicable due to design changes

My question to you or anyone who is familiar with NFS on GFS, or GFS in 
general, which of the following are still valid issues for the current 
(6.1u4) version of GFS. If all or most of them still apply, I can use 
this as motivation for my customer to strongly consider going off NFS on 
GFS. Removing the NFS from our GFS cluster has been on the cards for 
quite a while, but has not gained momentum due to lack of information on 
the performance gains of such a move (very difficult to gage) or the 
architectural problems/limitations of NFS on GFS (for which the 
following extract is spot-on).

Note - can you consider adding a link to this document from your FAQ?

+++++++++

o  NFS Support

A GFS filesystem can be exported through NFS to other
nodes.  There
are a number of issues with NFS on top of a cluster
filesystem,
though.

1) Filehandle misses

    When a NFS request comes into the server, it's up to
the filesystem
    (and a few Linux helper routines) to map the NFS
filehandle to the
    correct inode.  Doing that is easy if the inode is
already in the
    node's cache.  The tricky part is when the
filesystem must read in
    the inode from the disk.  There is nothing in the
filehandle that
    anchors the inode into the filesystem (such as a
glock on a
    directory that contains an entry pointing to the
inode), so a lot
    more care has to taken to make sure the block really
contains a
    valid inode.  (See the section on the proposed new
RG formats.)

    It's also non-trivial to handle inode migration in
GFS when a NFS
    server is running.  There is no centralized data
structure that can
    map filehandles into inodes (such a structure would be a
    scalability/performance bottleneck).  It's difficult
to find a
    representation of the inode that could be used to
quickly find it
    even in the face of the inode changing blocks.

    Another problem is that filehandle requests can come
in random
    times for inodes that don't exist anymore or are in
the process of
    being recreated.  This can break optimizations based
on ideas like
    "since this node in the process of creating this
inode, it are
    the only one who knows about its locks".  GFS has
suffered from
    these mis-optimizations in the past.  From what I've
seen, I believe
    OCFS2 currently has problems like this, too.

2) Readdir

    Linux has an interesting mechanism to do handle
readdir() requests.
    The VFS (or NFSD) passes the filesystem a request
containing not
    only the directory and offset to be read, but a
filldir function to
    call for each entry found.  So, the filesystem
doesn't directly
    fill in a buffer of entries, but calls an arbitrary
routine that
    can either put the entries into a buffer or do some
other type of
    processing on them.  This is a powerful concept, but
can be easily
    misused.

    I believe that NFSD's use of it is problematic at
best.  The
    filldir routine used by NFSD for the readdirplus NFS
procedure
    calls back into the filesystem to do a lookup and
stat() on the
    inode pointed to by the entry.  This call is painful
because of
    GFS' locking.  gfs_readdir() must call filldir with
the directory
    lock held so that it doesn't lose its place in the
directory.  The
    stat() that the filldir routine does causes the
inode's lock to be
    acquired.  Because concurrent inode locks must
always be acquired
    in ascending numerical order and the filldir routine
forces an
    ordering that might be something other than that,
there is a
    deadlock potential.

    GFS detects when NFSD calls its readdir and switches
to a routine
    that avoids calling the filldir routine with the
lock held.  It's
    not as efficient, but it avoids the deadlock.  It'd
be nice if
    there was a better way to do the detection, though.
  (The code
    currently looks at the program's name.)

3) FCNTL locking

    There are a huge number of issues with acquiring and
failing over
    fcntl()-style locks when there are multiple GFS
heads exporting
    NFS.  GFS pretty much ignores them right now.  A
good place to
    start would be to change NFSD so it actually passes
fcntl calls
    down into the filesystem.

4) NFSv4

    NFSv4 requires all sorts of changes to GFS in order
for them to
    work together.  Op locks being one I can remember at
the moment.
    I think I've repressed my memories of the others.

++++++++
-------------- next part --------------
A non-text attachment was scrubbed...
Name: riaan.vcf
Type: text/x-vcard
Size: 310 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060821/fa68b8fc/attachment.vcf>


More information about the Linux-cluster mailing list