rpms/kernel/F-11 linux-2.6-ext4-clear-unwritten-flag.patch, NONE, 1.1 linux-2.6-ext4-fake-delalloc-bno.patch, NONE, 1.1 linux-2.6-ext4-fix-i_cached_extent-race.patch, NONE, 1.1 kernel.spec, 1.1605, 1.1606 linux-2.6-ext4-prealloc-fixes.patch, 1.1, 1.2

Eric Sandeen sandeen at fedoraproject.org
Fri May 15 21:48:26 UTC 2009


Author: sandeen

Update of /cvs/pkgs/rpms/kernel/F-11
In directory cvs1.fedora.phx.redhat.com:/tmp/cvs-serv23480

Modified Files:
	kernel.spec linux-2.6-ext4-prealloc-fixes.patch 
Added Files:
	linux-2.6-ext4-clear-unwritten-flag.patch 
	linux-2.6-ext4-fake-delalloc-bno.patch 
	linux-2.6-ext4-fix-i_cached_extent-race.patch 
Log Message:
* Fri May 15 2009 Eric Sandeen <sandeen at redhat.com>
- ext4: corruption fixes from upstream.


linux-2.6-ext4-clear-unwritten-flag.patch:

--- NEW FILE linux-2.6-ext4-clear-unwritten-flag.patch ---
From: Aneesh Kumar K.V <aneesh.kumar at linux.vnet.ibm.com>
Date: Thu, 14 May 2009 21:05:39 +0000 (-0400)
Subject: ext4: Clear the unwritten buffer_head flag after the extent is initialized
X-Git-Url: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=2a8964d63d50dd2d65d71d342bc7fb6ef4117614

ext4: Clear the unwritten buffer_head flag after the extent is initialized

The BH_Unwritten flag indicates that the buffer is allocated on disk
but has not been written; that is, the disk was part of a persistent
preallocation area.  That flag should only be set when a get_blocks()
function is looking up a inode's logical to physical block mapping.

When ext4_get_blocks_wrap() is called with create=1, the uninitialized
extent is converted into an initialized one, so the BH_Unwritten flag
is no longer appropriate.  Hence, we need to make sure the
BH_Unwritten is not left set, since the combination of BH_Mapped and
BH_Unwritten is not allowed; among other things, it will result ext4's
get_block() to be called over and over again during the write_begin
phase of write(2).

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso at mit.edu>
---

Index: linux-2.6.29.noarch/fs/ext4/inode.c
===================================================================
--- linux-2.6.29.noarch.orig/fs/ext4/inode.c
+++ linux-2.6.29.noarch/fs/ext4/inode.c
@@ -1069,6 +1069,7 @@ int ext4_get_blocks_wrap(handle_t *handl
 	int retval;
 
 	clear_buffer_mapped(bh);
+	clear_buffer_unwritten(bh);
 
 	/*
 	 * Try to see if we can get  the block without requesting
@@ -1099,6 +1100,18 @@ int ext4_get_blocks_wrap(handle_t *handl
 		return retval;
 
 	/*
+	 * When we call get_blocks without the create flag, the
+	 * BH_Unwritten flag could have gotten set if the blocks
+	 * requested were part of a uninitialized extent.  We need to
+	 * clear this flag now that we are committed to convert all or
+	 * part of the uninitialized extent to be an initialized
+	 * extent.  This is because we need to avoid the combination
+	 * of BH_Unwritten and BH_Mapped flags being simultaneously
+	 * set on the buffer_head.
+	 */
+	clear_buffer_unwritten(bh);
+
+	/*
 	 * New blocks allocate and/or writing to uninitialized extent
 	 * will possibly result in updating i_data, so we take
 	 * the write lock of i_data_sem, and call get_blocks()

linux-2.6-ext4-fake-delalloc-bno.patch:

--- NEW FILE linux-2.6-ext4-fake-delalloc-bno.patch ---
From: Aneesh Kumar K.V <aneesh.kumar at linux.vnet.ibm.com>
Date: Tue, 12 May 2009 18:40:37 +0000 (-0400)
Subject: ext4: Use a fake block number for delayed new buffer_head
X-Git-Url: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=33b9817e2ae097c7b8d256e3510ac6c54fc6d9d0

ext4: Use a fake block number for delayed new buffer_head

Use a very large unsigned number (~0xffff) as as the fake block number
for the delayed new buffer. The VFS should never try to write out this
number, but if it does, this will make it obvious.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso at mit.edu>
---

Index: linux-2.6.29.noarch/fs/ext4/inode.c
===================================================================
--- linux-2.6.29.noarch.orig/fs/ext4/inode.c
+++ linux-2.6.29.noarch/fs/ext4/inode.c
@@ -2213,6 +2213,10 @@ static int ext4_da_get_block_prep(struct
 				  struct buffer_head *bh_result, int create)
 {
 	int ret = 0;
+	sector_t invalid_block = ~((sector_t) 0xffff);
+
+	if (invalid_block < ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es))
+		invalid_block = ~0;
 
 	BUG_ON(create == 0);
 	BUG_ON(bh_result->b_size != inode->i_sb->s_blocksize);
@@ -2234,7 +2238,7 @@ static int ext4_da_get_block_prep(struct
 			/* not enough space to reserve */
 			return ret;
 
-		map_bh(bh_result, inode->i_sb, 0);
+		map_bh(bh_result, inode->i_sb, invalid_block);
 		set_buffer_new(bh_result);
 		set_buffer_delay(bh_result);
 	} else if (ret > 0) {

linux-2.6-ext4-fix-i_cached_extent-race.patch:

--- NEW FILE linux-2.6-ext4-fix-i_cached_extent-race.patch ---
From: Theodore Ts'o <tytso at mit.edu>
Date: Fri, 15 May 2009 13:07:28 +0000 (-0400)
Subject: ext4: Fix race in ext4_inode_info.i_cached_extent
X-Git-Url: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=2ec0ae3acec47f628179ee95fe2c4da01b5e9fc4

ext4: Fix race in ext4_inode_info.i_cached_extent

If two CPU's simultaneously call ext4_ext_get_blocks() at the same
time, there is nothing protecting the i_cached_extent structure from
being used and updated at the same time.  This could potentially cause
the wrong location on disk to be read or written to, including
potentially causing the corruption of the block group descriptors
and/or inode table.

This bug has been in the ext4 code since almost the very beginning of
ext4's development.  Fortunately once the data is stored in the page
cache cache, ext4_get_blocks() doesn't need to be called, so trying to
replicate this problem to the point where we could identify its root
cause was *extremely* difficult.  Many thanks to Kevin Shanahan for
working over several months to be able to reproduce this easily so we
could finally nail down the cause of the corruption.

Signed-off-by: "Theodore Ts'o" <tytso at mit.edu>
Reviewed-by: "Aneesh Kumar K.V" <aneesh.kumar at linux.vnet.ibm.com>
---

Index: linux-2.6.29.noarch/fs/ext4/extents.c
===================================================================
--- linux-2.6.29.noarch.orig/fs/ext4/extents.c
+++ linux-2.6.29.noarch/fs/ext4/extents.c
@@ -1740,11 +1740,13 @@ ext4_ext_put_in_cache(struct inode *inod
 {
 	struct ext4_ext_cache *cex;
 	BUG_ON(len == 0);
+	spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
 	cex = &EXT4_I(inode)->i_cached_extent;
 	cex->ec_type = type;
 	cex->ec_block = block;
 	cex->ec_len = len;
 	cex->ec_start = start;
+	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
 }
 
 /*
@@ -1801,12 +1803,17 @@ ext4_ext_in_cache(struct inode *inode, e
 			struct ext4_extent *ex)
 {
 	struct ext4_ext_cache *cex;
+	int ret = EXT4_EXT_CACHE_NO;
 
+	/* 
+	 * We borrow i_block_reservation_lock to protect i_cached_extent
+	 */
+	spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
 	cex = &EXT4_I(inode)->i_cached_extent;
 
 	/* has cache valid data? */
 	if (cex->ec_type == EXT4_EXT_CACHE_NO)
-		return EXT4_EXT_CACHE_NO;
+		goto errout;
 
 	BUG_ON(cex->ec_type != EXT4_EXT_CACHE_GAP &&
 			cex->ec_type != EXT4_EXT_CACHE_EXTENT);
@@ -1817,11 +1824,11 @@ ext4_ext_in_cache(struct inode *inode, e
 		ext_debug("%u cached by %u:%u:%llu\n",
 				block,
 				cex->ec_block, cex->ec_len, cex->ec_start);
-		return cex->ec_type;
+		ret = cex->ec_type;
 	}
-
-	/* not in cache */
-	return EXT4_EXT_CACHE_NO;
+errout:
+	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
+	return ret;
 }
 
 /*


Index: kernel.spec
===================================================================
RCS file: /cvs/pkgs/rpms/kernel/F-11/kernel.spec,v
retrieving revision 1.1605
retrieving revision 1.1606
diff -u -p -r1.1605 -r1.1606
--- kernel.spec	15 May 2009 20:10:54 -0000	1.1605
+++ kernel.spec	15 May 2009 21:47:55 -0000	1.1606
@@ -717,9 +717,13 @@ Patch2902: linux-2.6-v4l-dvb-fix-uint16_
 Patch2903: linux-2.6-revert-dvb-net-kabi-change.patch
 
 # fs fixes
+# ext4 fixes, all from upstream (.30)
 Patch2920: linux-2.6-ext4-flush-on-close.patch
 Patch2921: linux-2.6-ext4-really-print-warning-once.patch
 Patch2922: linux-2.6-ext4-prealloc-fixes.patch
+Patch2923: linux-2.6-ext4-fake-delalloc-bno.patch
+Patch2924: linux-2.6-ext4-clear-unwritten-flag.patch
+Patch2925: linux-2.6-ext4-fix-i_cached_extent-race.patch
 
 Patch3000: linux-2.6-btrfs-unstable-update.patch
 Patch3010: linux-2.6-relatime-by-default.patch
@@ -1195,6 +1199,9 @@ ApplyPatch linux-2.6-execshield.patch
 ApplyPatch linux-2.6-ext4-flush-on-close.patch
 ApplyPatch linux-2.6-ext4-really-print-warning-once.patch
 ApplyPatch linux-2.6-ext4-prealloc-fixes.patch
+ApplyPatch linux-2.6-ext4-fake-delalloc-bno.patch
+ApplyPatch linux-2.6-ext4-clear-unwritten-flag.patch
+ApplyPatch linux-2.6-ext4-fix-i_cached_extent-race.patch
 
 # xfs
 
@@ -1972,6 +1979,9 @@ fi
 # and build.
 
 %changelog
+* Fri May 15 2009 Eric Sandeen <sandeen at redhat.com>
+- ext4: corruption fixes from upstream.
+
 * Fri May 15 2009 Adam Jackson <ajax at redhat.com>
 - drm: ignore tiny modes from EDID.
 

linux-2.6-ext4-prealloc-fixes.patch:

Index: linux-2.6-ext4-prealloc-fixes.patch
===================================================================
RCS file: /cvs/pkgs/rpms/kernel/F-11/linux-2.6-ext4-prealloc-fixes.patch,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -p -r1.1 -r1.2
--- linux-2.6-ext4-prealloc-fixes.patch	1 May 2009 18:59:33 -0000	1.1
+++ linux-2.6-ext4-prealloc-fixes.patch	15 May 2009 21:47:55 -0000	1.2
@@ -1,34 +1,41 @@
-We need to mark the  buffer_head mapping prealloc space
-as new during write_begin. Otherwise we don't zero out the
-page cache content properly for a partial write. This will
-cause file corruption with preallocation.
-
-Also use block number -1 as the fake block number so that
-unmap_underlying_metadata doesn't drop wrong buffer_head
-
-Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.vnet.ibm.com>
-
-Block number '0' should not be used as the fake block number for
-the delayed new buffer. This will result in vfs calling umap_underlying_metadata for
-block number '0'. So  use -1 instead.
+From: Aneesh Kumar K.V <aneesh.kumar at linux.vnet.ibm.com>
+Date: Wed, 13 May 2009 22:36:58 +0000 (-0400)
+Subject: ext4: Fix sub-block zeroing for writes into preallocated extents
+X-Git-Url: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=9c1ee184a30394e54165fa4c15923cabd952c106
+
+ext4: Fix sub-block zeroing for writes into preallocated extents
+
+We need to mark the buffer_head mapping preallocated space as new
+during write_begin. Otherwise we don't zero out the page cache content
+properly for a partial write. This will cause file corruption with
+preallocation.
+
+Now that we mark the buffer_head new we also need to have a valid
+buffer_head blocknr so that unmap_underlying_metadata() unmaps the
+correct block.
 
 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.vnet.ibm.com>
-
+Signed-off-by: "Theodore Ts'o" <tytso at mit.edu>
 ---
- fs/ext4/inode.c |    2 +-
- 1 files changed, 1 insertions(+), 1 deletions(-)
 
+Index: linux-2.6.29.noarch/fs/ext4/extents.c
+===================================================================
+--- linux-2.6.29.noarch.orig/fs/ext4/extents.c
++++ linux-2.6.29.noarch/fs/ext4/extents.c
+@@ -2776,6 +2776,8 @@ int ext4_ext_get_blocks(handle_t *handle
+ 				if (allocated > max_blocks)
+ 					allocated = max_blocks;
+ 				set_buffer_unwritten(bh_result);
++				bh_result->b_bdev = inode->i_sb->s_bdev;
++				bh_result->b_blocknr = newblock;
+ 				goto out2;
+ 			}
+ 
 Index: linux-2.6.29.noarch/fs/ext4/inode.c
 ===================================================================
 --- linux-2.6.29.noarch.orig/fs/ext4/inode.c
 +++ linux-2.6.29.noarch/fs/ext4/inode.c
-@@ -2318,11 +2318,21 @@ static int ext4_da_get_block_prep(struct
- 			/* not enough space to reserve */
- 			return ret;
- 
--		map_bh(bh_result, inode->i_sb, 0);
-+		map_bh(bh_result, inode->i_sb, -1);
- 		set_buffer_new(bh_result);
+@@ -2239,6 +2239,13 @@ static int ext4_da_get_block_prep(struct
  		set_buffer_delay(bh_result);
  	} else if (ret > 0) {
  		bh_result->b_size = (ret << inode->i_blkbits);
@@ -37,11 +44,8 @@ Index: linux-2.6.29.noarch/fs/ext4/inode
 +		 * we also need to mark the buffer as new so that
 +		 * the unwritten parts of the buffer gets correctly zeroed.
 +		 */
-+		if (buffer_unwritten(bh_result)) {
-+			bh_result->b_bdev = inode->i_sb->s_bdev;
++		if (buffer_unwritten(bh_result))
 +			set_buffer_new(bh_result);
-+			bh_result->b_blocknr = -1;
-+		}
  		ret = 0;
  	}
  




More information about the fedora-extras-commits mailing list