[Cluster-devel] [PATCH] Fix freeze of cluster-2.03.11

Mon Apr 20 13:33:14 UTC 2009

Hi,

On Mon, 2009-04-20 at 15:14 +0200, Fabio M. Di Nitto wrote:
> Hi,
> 
> On Mon, 2009-04-20 at 14:27 +0200, Kadlecsik Jozsef wrote:
> > Hi,
> > 
> > I reported on the linux-cluster mailing list (see thread
> > https://www.redhat.com/archives/linux-cluster/2009-March/msg00176.html)
> > that version cluster-2.03.11 freezes.
> > 
> > As Wendy Cheng pointed out, the reason for this is that the filesystem API 
> > of the kernel changed and "put_inode" was replaced by "drop_inode": while 
> > "put_inode" was called without holding any lock, "drop_inode" is called 
> > under inode_lock held but the called gfs_sync_page_i may block
> > (https://www.redhat.com/archives/linux-cluster/2009-April/msg00060.html).
> >
Its not a replacement as such, they are different operations.
->put_inode() was called on each and every iput() (despite what the docs
said, as Christoph noted) whereas ->drop_inode() is only called when the
inode is being removed from the cache.

The purpose of drop inode is to allow the fs to override the decision
about whether the inode needs to be dallocated (->delete_inode()) or not
(->clear_inode()).

Christoph's patch is here:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=33dcdac2df54e66c447ae03f58c95c7251aa5649

>  
> > The patch below (applied in gfs-kernel/src/gfs/) fixes the bug by 
> > releasing the lock before calling gfs_sync_page_i and locking it back 
> > after the function call:
> > 
> > --- gfs-orig/ops_super.c	2009-01-22 13:33:51.000000000 +0100
> > +++ gfs/ops_super.c	2009-04-06 13:07:06.000000000 +0200
> > @@ -9,6 +9,7 @@
> >  #include <linux/statfs.h>
> >  #include <linux/seq_file.h>
> >  #include <linux/mount.h>
> > +#include <linux/writeback.h>
> >  
> >  #include "gfs.h"
> >  #include "dio.h"
> > @@ -68,8 +69,11 @@
> >  	if (ip &&
> >  	    !inode->i_nlink &&
> >  	    S_ISREG(inode->i_mode) &&
> > -	    !sdp->sd_args.ar_localcaching)
> > +	    !sdp->sd_args.ar_localcaching) {
> > +	    	spin_unlock(&inode_lock);
> >  		gfs_sync_page_i(inode, DIO_START | DIO_WAIT);
> > +		spin_lock(&inode_lock);
> > +	}
> >  	generic_drop_inode(inode);
> >  }
> >
You can't do this in this particular way because you must set the
inode's state before releasing the inode_lock. That is done in
generic_drop_inode() and the functions that it calls.

Not to mention which, this doesn't seem to make any sense. It appears to
be trying to write the inode's data back to disk, but if we've got to
this point in the inode's life cycle, there should be nothing to write.

Since its only called when i_nlink is 0, thats even more strange since
(a) it could be moved to ->delete_inode() and (b) why do we care about
forcing out dirty data for an inode thats being unlinked anyway....

This patch raises more questions than it answers I think.

>   
> > 
> > Additionally, the line
> > 
> > EXPORT_SYMBOL(inode_lock);
> > 
> > must be added to fs/inode.c in order to the inode_lock symbol get 
> > exported.
> 
> thanks a lot for you effort but we will need to find another solution.
> 
> it's been a long time since gfs1 required extra exported symbols from
> the kernel and asking distributions to do that again is not really an
> option (not sure if there any other option either).
> 
> I'll let the gfs guys look at it.
> 
> Thanks again!
> Fabio
> 
Yes, we can't have any of our own exports now (since gfs isn't in the
upstream kernel), as per the ruling from the Fedora people,

Steve.