[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] [PATCH 5 of 5] Bz #248176: GFS2: invalid metadata block - REVISED



Hi,

On Thu, 2007-08-09 at 15:07 -0400, Wendy Cheng wrote:
> Bob Peterson wrote:
> > The problem was that the journal inodes, although protected by
> > a glock, were not synched with the other nodes because they don't
> > use the inode glock synch operations (i.e. no "glops" were defined).
> > Therefore, journal recovery on a journal-recovering node were causing
> > the blocks to get out of sync with the node that was actually trying
> > to use that journal as it comes back up from a reboot.
> >   
> 
> I don't understand this patch either. Maybe I have worked too long in 
> GFS1 so please educate me on these GFS2 internals.  Comment below:
> > There are two possible solutions: (1) To make the journals use the
> > normal inode glock sync operations, or (2) To make the journal
> > operations take effect immediately (i.e. no caching).  Although
> > option 1 works, it turns out to be a lot more code.  Steve opted
> > for option 2, which is much simpler and therefore less prone to
> > regression errors.
> >
> > Regards,
> >
> > Bob Peterson
> > --
> > Signed-off-by: Bob Peterson <rpeterso redhat com> 
> > --
> > diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> > index 58c730b..f0bcaa2 100644
> > --- a/fs/gfs2/ops_fstype.c
> > +++ b/fs/gfs2/ops_fstype.c
> > @@ -358,7 +358,7 @@ static int init_journal(struct gfs2_sbd *sdp, int undo)
> >  
> >  		ip = GFS2_I(sdp->sd_jdesc->jd_inode);
> >  		error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED,
> > -					   LM_FLAG_NOEXP | GL_EXACT,
> > +					   LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE,
> >  					   &sdp->sd_jinode_gh);
> >  		if (error) {
> >  			fs_err(sdp, "can't acquire journal inode glock: %d\n",
> > diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
> > index 5ada38c..beb6c7a 100644
> > --- a/fs/gfs2/recovery.c
> > +++ b/fs/gfs2/recovery.c
> > @@ -469,7 +469,7 @@ int gfs2_recover_journal(struct gfs2_jdesc *jd)
> >  		};
> >  
> >  		error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED,
> > -					   LM_FLAG_NOEXP, &ji_gh);
> > +					   LM_FLAG_NOEXP | GL_NOCACHE, &ji_gh);
> >  		if (error)
> >  			goto fail_gunlock_j;
> >  	} else {
> >
> >   
> This lock is requested as "SHARED" (read lock). So how does "GL_NOCACHE" 
> help it to "sync" with other nodes regarding to disk blocks sharing as 
> you described above ? For a normal EXCLUSIVE inode glock with nocache, 
> it will force a sync (disk blocks). However, this is a read lock. So 
> what is the problem this patch has solved ?
> 
> -- Wendy
> 

This is the easy one to explain.... the page cache relating to the
journal inode is only used to read the journal for recovery and
otherwise is unused. The problem here comes when a node tries to recover
the journal of the same remote node twice (with no umounts etc. between
the two events). In that case it was possible for a node to "see" stale
data from the first recovery attempt while reading the journal for the
second recovery. Using GL_NOCACHE means that we now flush the cache
after the first recovery, so when the second recovery occurs, the blocks
will be read fresh from disk,

Steve.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]