[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] [PATCH] GFS2: dirty inode correctly in gfs2_write_end



Hi,

On Wed, 2013-09-04 at 11:59 -0500, Benjamin Marzinski wrote:
> On Wed, Sep 04, 2013 at 03:55:03PM +0100, Steven Whitehouse wrote:
> > Hi,
> > 
> > On Tue, 2013-09-03 at 16:59 -0500, Benjamin Marzinski wrote:
> > > GFS2 was only setting I_DIRTY_DATASYNC on files that it wrote to, when
> > > it actually increased the file size.  If gfs2_fsync was called without
> > > I_DIRTY_DATASYNC set, it didn't flush the incore data to the log before
> > > returning, so any metadata or journaled data changes were not getting
> > > fsynced. This meant that writes to the middle of files were not always
> > > getting fsynced properly.
> > > 
> > > This patch makes gfs2 set I_DIRTY_DATASYNC whenever metadata has been
> > > updated during a write. It also make gfs2_sync flush the incore log
> > > if I_DIRTY_PAGES is set, and the file is using data journalling. This
> > > will make sure that all incore logged data gets written to disk before
> > > returning from a fsync.
> > > 
> > > Signed-off-by: Benjamin Marzinski <bmarzins redhat com>
> > > ---
> > >  fs/gfs2/aops.c |    9 +++++++--
> > >  fs/gfs2/file.c |    4 +++-
> > >  2 files changed, 10 insertions(+), 3 deletions(-)
> > > 
> > > Index: gfs2-3.0-nmw-130830/fs/gfs2/aops.c
> > > ===================================================================
> > > --- gfs2-3.0-nmw-130830.orig/fs/gfs2/aops.c
> > > +++ gfs2-3.0-nmw-130830/fs/gfs2/aops.c
> > > @@ -815,6 +815,8 @@ static int gfs2_write_end(struct file *f
> > >  	unsigned int from = pos & (PAGE_CACHE_SIZE - 1);
> > >  	unsigned int to = from + len;
> > >  	int ret;
> > > +	struct gfs2_trans *tr = current->journal_info;
> > > +	BUG_ON(!tr);
> > >  
> > >  	BUG_ON(gfs2_glock_is_locked_by_me(ip->i_gl) == NULL);
> > >  
> > > @@ -825,8 +827,6 @@ static int gfs2_write_end(struct file *f
> > >  		goto failed;
> > >  	}
> > >  
> > > -	gfs2_trans_add_meta(ip->i_gl, dibh);
> > > -
> > >  	if (gfs2_is_stuffed(ip))
> > >  		return gfs2_stuffed_write_end(inode, dibh, pos, len, copied, page);
> > >  
> > > @@ -834,6 +834,11 @@ static int gfs2_write_end(struct file *f
> > >  		gfs2_page_add_databufs(ip, page, from, to);
> > >  
> > >  	ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
> > > +	if (tr->tr_num_buf_new)
> > > +		__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
> > > +	else
> > > +		gfs2_trans_add_meta(ip->i_gl, dibh);
> > > +
> > I'm not sure I follow this bit... generic_write_end() will call
> > mark_inode_dirty() if the file size has changed. So that case should be
> > covered. That leaves only timestamps which I think should be
> > I_DIRTY_SYNC since we don't need those for fdatasync. But why is there a
> > conditional here on tr_num_buf_new ? Does the "else" actually occur in
> > practice?
> 
> The idea is to only set I_DIRTY_DATASYNC if we've actually updated
> metadata because of the write.

I think we ought to be updating the metadata for every write (although
see case #3 below), since the timestamp should change each time. If the
size of the file doesn't change then we don't need to flush the metadata
on fdatasync, even though the timestamp has changed.

I think we have three cases:

1. i_size changes: metadata must be flushed on fsync/fdatasync, so
I_DIRTY_DATASYNC should be set (done in generic_write_end())
2. only timestamps change: metatdata must be flushed on fsync but not on
fdatasync, so I_DIRTY_SYNC should be set (I assume perhaps the
problematic case)
3. write is of 0 bytes: I think we are allowed to skip all metadata
updates (need to check that we do this)

>  Otherwise we flush the log for every
> write, even if we aren't data journalling and haven't changed any
> metadata.  If the write happened without adding any buffers to the
> transaction (which happens if we are just writing over already allocated
> space), then tr->tr_num_buf_new will be 0. This means we haven't
> actually dirtied any metadata, and we don't need to flush it to the log
> during the fsync.
> 
Does that happen for zero sized writes? Otherwise I'd be expecting us to
be updating time stamps at least on each write.

> With instrumentation in the code, I can definitely see that with the
> check for tr->tr_num_buf_new, I_DIRTY_DATASYNC isn't set if we are just
> rewriting over already allocated space, and gfs2_fsync doesn't need to
> do all the extra I_DIRTY_DATASYNC work, cutting down the performance
> impact of this patch.
> 
> If you are doing data journalling, you obviously need to flush the
> changed datablocks to the journal in gfs2_fsync.  Otherwise it won't
> write them back to their inplace locations. This is true even if the
> write didn't update metadata, and so I_DIRTY_DATASYNC isn't set.  That's
> why I have gfs2_fsync check for I_DIRTY_PAGES if an inode is using data
> journalling.
> 
> Does that make sense, or am I missing something?
> 
Yes, I certainly agree with the changes at the fsync end wrt jdata. That
all makes sense to me.

> Now the part of my patch that I don't understand is the bit after the
> "else".  See my questions at
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=991596#c19
> 
> I added the "else" to keep gfs2_write_end always adding the inode to the
> transaction, but I don't know why that is necessary. If we aren't
> changing any metadata, why do we need to add the inode to the
> transaction at all.
> 
> -Ben
> 
Ah, I see now. I'd not spotted those questions before. I think the
answer to question #1 most likely relates to time stamps, and I hope
that I've more or less answered it above.

I suspect that the answer to question #2 is no. We call
filemap_fdatawrite/wait to writeback the pages of the two address
spaces, and flush the log, etc in order to ensure that all the metadata
is in a suitable state to be written back. So the issue is just to make
sure that everything gets written back and waited for, and I think we
are doing that correctly,

Steve.

> > 
> > Otherwise I think the patch looks good,
> > 
> > Steve.
> > 
> > >  
> > >  	if (inode == sdp->sd_rindex) {
> > >  		adjust_fs_space(inode);
> > > Index: gfs2-3.0-nmw-130830/fs/gfs2/file.c
> > > ===================================================================
> > > --- gfs2-3.0-nmw-130830.orig/fs/gfs2/file.c
> > > +++ gfs2-3.0-nmw-130830/fs/gfs2/file.c
> > > @@ -650,7 +650,7 @@ static int gfs2_fsync(struct file *file,
> > >  {
> > >  	struct address_space *mapping = file->f_mapping;
> > >  	struct inode *inode = mapping->host;
> > > -	int sync_state = inode->i_state & (I_DIRTY_SYNC|I_DIRTY_DATASYNC);
> > > +	int sync_state = inode->i_state & I_DIRTY;
> > >  	struct gfs2_inode *ip = GFS2_I(inode);
> > >  	int ret = 0, ret1 = 0;
> > >  
> > > @@ -660,6 +660,8 @@ static int gfs2_fsync(struct file *file,
> > >  			return ret1;
> > >  	}
> > >  
> > > +	if (!gfs2_is_jdata(ip))
> > > +		sync_state &= ~I_DIRTY_PAGES;
> > >  	if (datasync)
> > >  		sync_state &= ~I_DIRTY_SYNC;
> > >  
> > > 
> > 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]