[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: ext3 performance issue with a Berkeley db application


On Tue, 2003-02-04 at 01:48, Andrew Morton wrote:

> Now, generally the kernel will attempt to prevent serialising userspace
> behind background writeout.  But there's one spot in do_get_write_access():
>                 if (jh->b_jlist == BJ_Shadow) {
> where a random mark_inode_dirty() call will serialise behind the ongoing
> transaction commit.  

That is a deliberate choice, but it's something I've wondered about. 
Basically, the problem is this --- the journal *must* be a consistent
snapshot of the filesystem, but at the same time, we want to avoid
having to do an actual copy of all dirty data for the journal. 

So, we don't do the copy if we can avoid it.  If, during the commit,
another transaction tries to modify the data, we just let it do so, and
we make a copy on the spot.  *But*, if we have already scheduled the old
data for IO at that point, then we can't do this, and we block.

The only way to avoid this is to do the copy in the first place, during
commit, before we know whether or not anybody will need the copy; and
that will be expensive on CPU time if it turns out that nobody needs the

mark_inode_dirty() is a special case, though, and the current ext3 dev
snapshots avoid that blocking on buffer-cache operations for the most
part; but we still need to reserve the journal space for the operation,
and that still blocks in the case above.  I wonder if it might be worth
special-casing inodes and superblocks, and always doing the commit copy
for those.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]