[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: freeing, allocating and free blocks in Ext3

Andrew Morton writes:
> Chris Mason wrote:
> > [ freed blocks not immediately available for data blocks on ext3 ]
> > 
> > I think it is better to have the allocation routines do:
> > 
> > if (allocate data block == disk full) {
> >     transaction end
> >     transaction start
> >     if (allocate data block == disk full) {
> >         return disk full
> >     }
> > }
> > 
> > Not pretty, but this way you only return disk full when it really is full.

This is probably a good idea to avoid unnecessary grief on the part of the
user.  It may not be terribly easy/safe to do in ext3, because while it is
OK to restart a handle in some cases (and flush existing transaction to disk),
I don't think you can safely do that in all cases - it would potentially
cause disk corruption in case of a crash.

> I believe Peter's main concern is that statfs() isn't
> telling the truth.  It's telling us that N blocks are
> available, when they really aren't available yet because
> they're being held up in non-allocatable state until the
> transaction which released them has committed.
> statfs() is always going to be approximate (ie: racy) in
> the presence of other tasks which are using the fs.  But
> I suggest that if it's going to be in error, it should
> under-report free space, not over-report.
> I suspect the fix is fairly simple:
> - when we set a bit in ->b_committed_data, increment
>   transaction->t_pending_free_blocks and increment
>   journal->j_pending_free_blocks.
> - when a transaction's commit record hits disk, subtract
>   its t_pending_free_blocks from journal->j_pending_free_blocks
> - In ext3_count_free_blocks(), grab the BKL and return
>   s_free_blocks_count - j_pending_free_blocks.
> A question is whether the free block counts in the committed
> ext3 superblock and group descriptors is always correct,
> particularly across a large truncate which needed journal
> extension.  It should be OK as long as the superblock and
> changed group descriptor blocks are always included in each
> transaction.

These counts _have_ to be correct (i.e. included in each transaction
that changes them) otherwise you could have corruption after a crash.
If they are not handled correctly, this would eventually this would lead
to overflows on a long-running filesystem and I doubt that happens.

Cheers, Andreas
Andreas Dilger                               Turbolinux filesystem development

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]