[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

RE: Poor Performance WhenNumber of Files > 1M

I am seeing similar problems to Sean McCauliff (2007-08-02) using ext3. I have a simple test that times file creations in a hashed directory structure. File creation time inexorably increases as the number of files in the filesystem increases. Altering variables can change the absolute performance, but I always see the steady performance degradation.

All of the following have no material effect on the steady drop in performance:

File length (1k, 4k, 16k)
Directory depth (5, 10, 15)
Average & Max files per directory (10, 20, 100)
Single or multi-threaded test
Moving test directory to a new name on same filesystem, restarting test.
Directory hash
RAID10 vs. simple disk
Linux version (RHE, Ubuntu)
System memory (32gig, 2gig)
Syncing after each close
Free space
Partition Age (old, perhaps fragmented, a bit dirty, new fs)

Performance seems to always map directly to the number of files in the ext3 filesystem.

After some initial run-fast time, perhaps once dirty pages begin to be written aggressively, for every 5,000 files added, my files created per second tends to drop by about one. So, depending on the variables, say with 6 RAID10 spindles, I might start at ~700 files/sec, quickly drop, then more slowly drop to ~300 files/sec at perhaps 1 million files, then see 299 files/sec for the next 5,000 creations, 298 files/sec, etc. etc.

As you'd expect, there isn't much CPU utilization, other than iowait, and some kjournald activity.

Is this a known limitation of ext3? Is expecting to write to O(10^6)-O(10^7) files in something approaching constant time expecting too much from a filesystem? What, exactly, am I stressing to cause this unbounded performance degradation?

-John Kalucki
ext3 kalucki com


   Hi all,

   I plan on having about 100M files totaling about 8.5TiBytes.   To see
   how ext3 would perform with large numbers of files I've written a test
program which creates a configurable number of files into a configurable
   number of directories, reads from those files, lists them and then
   deletes them.  Even up to 1M files ext3 seems to perform well and scale
   linearly; the time to execute the program on 1M files is about double
   the time it takes it to execute on .5M files.  But past 1M files it
   seems to have n^2 scalability.  Test details appear below.

Looking at the various options for ext3 nothing jumps out as the obvious
   one to use to improve performance.

   Any recommendations?


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]