The maximum number of files under a folder

Stephen Samuel darkonc at gmail.com
Wed Mar 19 06:35:03 UTC 2008


The OS will have to search the directory to see if the file already exists
before creating it.

Well, if you hash it such that it splits up something like:
jobid(upper part)/jobid(lower- part)[/-]timestamp-process,
 you'll find that your access times will be must faster (especially if you
don't use H-Trees).  This also applies if  you're just creating a file,
because you'll have to search the entire directory to see if that filename
exists

With regular directories, searching through them to see if a file already
exist increases linearly with the number of entries.  If you hash on 3
levels with 8-bits per level, you'll have to open 2 or 3 extra inodes, but
you'll cut your directory search times down by a factor of 20000-1.  You'll
also skip having to deal with any sort of directory-size limit.
(=2^24/256/3)

I did something similar on a Solaris box which had 200000 emails in the
/var/spool/mqueue directory. That many messages was slowing the system to a
crawl.  I hashed it into 100 directories with 2000  entries each,   it sped
things up *enormously.*

On Tue, Mar 18, 2008 at 3:56 PM, Andreas Dilger <adilger at sun.com> wrote:

> On Mar 17, 2008  09:32 -0400, Theodore Ts'o wrote:
> > On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote:
> > > Theodore Tso,
> > >
> > >     In 64bit system, directory size can not be bigger than 2GB?
> >
> > No, because the high 32-bits for i_size are overloaded to store the
> > directory creation acl.
>
> I think we should change the code (kernel and e2fsprogs) to allow
> i_size_high for directories also.
>
> > In practice, you really don't want to have a directory that huge
> > anyway.  Iterating through it all with readdir() gets horribly slow,
> > and applications that try do anything with really huge directories
> > would be well advised to use a database, because they will get *much*
> > better performance that way....
>
> Actually, for many HPC applications they never do readdir at all.
> The job creates 1 file/process and always uses a predefined filename
> like {job}-{timestamp}-{process} that it will directly look up.
>
> Cheers, Andreas
>



-- 
Stephen Samuel http://www.bcgreen.com
778-861-7641
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20080318/b8c5b1ac/attachment.htm>


More information about the Ext3-users mailing list