[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] [GFS2 PATCH] GFS2: Delete directory block reservation on failure



Hi,

On Tue, 2013-07-30 at 11:42 -0400, Bob Peterson wrote:
> Hi,
> 
> ----- Original Message -----
> | On Tue, 2013-07-30 at 10:14 -0400, Bob Peterson wrote:
> | > Hi,
> | > 
> | > This patch adds one line of code that deletes a block reservation
> | > structure for the source directory in the event that the inode creation
> | > operation fails. If the inode creation succeeds, the reservation will
> | > be deleted anyway, since directory reservations are now only 1 block.
> | > 
> | Why would we want to do that? If the creation has failed then that gives
> | us no information about whether further allocations are likely to be
> | made for that directory,
> 
> It's hard to explain, but it has to do with keeping the bitmaps as
> defragmented as possible in memory so that we don't slow down file block
> allocations with tons of unnecessary reservation structures to go through.
> Directory reservations are only for a single block anyway, and in the case
> where a new inode is created successfully, the block reservation is deleted
> immediately thereafter. The reason we do this is to keep the bitmaps
> as tightly packed as possible so that file allocations are given priority.
> Otherwise we spend a huge amount of time rejecting many possible free
> blocks because of outstanding reservations left around for directories by
> virtue of the fact that directories are cached and not closed like files.
> 
> For details, see:
> http://git.kernel.org/cgit/linux/kernel/git/steve/gfs2-3.0-nmw.git/commit/fs/gfs2?id=af21ca8ed50f01c5278c5ded6dad6f05e8a5d2e4
> 
> However, in the unsuccessful case, today's code leaves the single-block
> reservation structure out there in memory for the directory, also
> fragmenting the bitmap and creating more clutter for the block allocator to
> go through when finding free blocks, just like we had before the
> aforementioned patch.
> 
> It seems pointless to leave the reservation around speculatively on the
> hopes of future dinode allocations for that directory. Even more so in the
> failure case, especially since it seems likely to fail a second and
> subsequent times as well for the same reason it failed this time.
> 
> Regards,
> 
> Bob Peterson
> Red Hat File Systems

Well I think we need to take a closer look at what is going on. There
are several issues here... one is whether our predictor for how many
blocks will be used is doing a good job. The answer seems to be not,
since otherwise we wouldn't have needed to cut the reservation size to a
single block as a temporary measure.

If there is really no need to use reservations with directories, then
the best solution would be just to not use them in that case at all, and
return to something closer to the old code. It makes no sense to spend a
lot of effort to reserve single blocks, as that defeats the objective of
trying to keep things in extents.

The other issue is whether we can do better with the directory
allocations in the first place. I'd very much like to see a scheme for
keeping the blocks which make up the hash table contiguous on disk, and
to add a flag to the inode which is set when this is the case. That
would allow us to read the entire hash table with a single i/o, whatever
size it was. This may be a much better approach for dealing with
directory allocations.

Also, we should look at adding a timeout, perhaps, to directory
reservations so that we can keep them for a short time, but drop them if
they become unused. We need to find some better predictors of when it is
likely that a lot of files will be created in a particular directory I
think,

Steve.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]