Searching for directories (to ensure no duplicates, etc) is going to
be order N^2.
Size of the directory is likely to be a limiting factor.
Try increasing to 10000 directories (in two layors of 100 each). I'll
bet you that the result will be a pretty good increase in speed
(getting back to the speeds that you had with 1M directories).
On 8/1/07, Sean McCauliff <smccauliff mail arc nasa gov> wrote:
Hi all,
I plan on having about 100M files totaling about 8.5TiBytes. To see
how ext3 would perform with large numbers of files I've written a test
program which creates a configurable number of files into a configurable
number of directories, reads from those files, lists them and then
deletes them. Even up to 1M files ext3 seems to perform well and scale
linearly; the time to execute the program on 1M files is about double
the time it takes it to execute on .5M files. But past 1M files it
seems to have n^2 scalability. Test details appear below.
Looking at the various options for ext3 nothing jumps out as the obvious
one to use to improve performance.
Any recommendations?