[Cluster-devel] [PATCH] Fix freeze of cluster-2.03.11

Mon Apr 20 16:36:10 UTC 2009

Hi,

On Mon, 20 Apr 2009, Steven Whitehouse wrote:

> On Mon, 2009-04-20 at 15:14 +0200, Fabio M. Di Nitto wrote:
> > 
> > On Mon, 2009-04-20 at 14:27 +0200, Kadlecsik Jozsef wrote:
> > > 
> > > I reported on the linux-cluster mailing list (see thread
> > > https://www.redhat.com/archives/linux-cluster/2009-March/msg00176.html)
> > > that version cluster-2.03.11 freezes.
> > > 
> > > As Wendy Cheng pointed out, the reason for this is that the filesystem API 
> > > of the kernel changed and "put_inode" was replaced by "drop_inode": while 
> > > "put_inode" was called without holding any lock, "drop_inode" is called 
> > > under inode_lock held but the called gfs_sync_page_i may block
> > > (https://www.redhat.com/archives/linux-cluster/2009-April/msg00060.html).
> > >
> Its not a replacement as such, they are different operations.
> ->put_inode() was called on each and every iput() (despite what the docs
> said, as Christoph noted) whereas ->drop_inode() is only called when the
> inode is being removed from the cache.

I knew I worded it incorrectly ;-).

> The purpose of drop inode is to allow the fs to override the decision
> about whether the inode needs to be dallocated (->delete_inode()) or not
> (->clear_inode()).
> 
> Christoph's patch is here:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=33dcdac2df54e66c447ae03f58c95c7251aa5649
> 
> >  
> > > The patch below (applied in gfs-kernel/src/gfs/) fixes the bug by 
> > > releasing the lock before calling gfs_sync_page_i and locking it back 
> > > after the function call:
> > > 
> > > --- gfs-orig/ops_super.c	2009-01-22 13:33:51.000000000 +0100
> > > +++ gfs/ops_super.c	2009-04-06 13:07:06.000000000 +0200
> > > @@ -9,6 +9,7 @@
> > >  #include <linux/statfs.h>
> > >  #include <linux/seq_file.h>
> > >  #include <linux/mount.h>
> > > +#include <linux/writeback.h>
> > >  
> > >  #include "gfs.h"
> > >  #include "dio.h"
> > > @@ -68,8 +69,11 @@
> > >  	if (ip &&
> > >  	    !inode->i_nlink &&
> > >  	    S_ISREG(inode->i_mode) &&
> > > -	    !sdp->sd_args.ar_localcaching)
> > > +	    !sdp->sd_args.ar_localcaching) {
> > > +	    	spin_unlock(&inode_lock);
> > >  		gfs_sync_page_i(inode, DIO_START | DIO_WAIT);
> > > +		spin_lock(&inode_lock);
> > > +	}
> > >  	generic_drop_inode(inode);
> > >  }
> > >
> You can't do this in this particular way because you must set the
> inode's state before releasing the inode_lock. That is done in
> generic_drop_inode() and the functions that it calls.
> 
> Not to mention which, this doesn't seem to make any sense. It appears to
> be trying to write the inode's data back to disk, but if we've got to
> this point in the inode's life cycle, there should be nothing to write.
> 
> Since its only called when i_nlink is 0, thats even more strange since
> (a) it could be moved to ->delete_inode() and (b) why do we care about
> forcing out dirty data for an inode thats being unlinked anyway....
> 
> This patch raises more questions than it answers I think.

Do you imply it is superfluous to call gfs_sync_page_i at all, and in 
consequence there's no need for a gfs-specific ->drop_inode function?

Just for the record: without unlocking/locking, the node having a mailmain 
queue over GFS got frozen in a few seconds as mailman queue manager was 
started. With the patch above it ran smoothly for a longer (test) period.

Best regards,
Jozsef
--
E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
         H-1525 Budapest 114, POB. 49, Hungary