[Linux-cluster] Ext3/ext4 in a clustered environement

Steven Whitehouse swhiteho at redhat.com
Wed Nov 9 15:15:00 UTC 2011


Hi,

On Wed, 2011-11-09 at 14:57 +0000, Alan Brown wrote:
> Nicolas Ross wrote:
> 
> > Get me right, there are millions of files, but no more than a few 
> > hundreds per directory. They are spread out splited on the database id, 
> > 2 caracters at a time. So a file name 1234567.jpg would end up in a 
> > directory 12/34/5/, or something similar.
> 
> OK, the way you wrote it looked like flat directory spacing.
> 
> We see appreciable knee points in GFS directory performance at 512, 4096 
> and 16384 files/directory, with progressively worse performance 
> deterioration between each knee pair. (It's a 2^n type problem)
> 
That is a bit strange. The GFS2 directory entries are sized according to
(length of file name + length of fixed size info) which means that
generally the number of blocks required to store a specific number of
files is not constant unless the file names are all the same length.

Also, once a directory has been unstuffed, the hash table will grow
until it is 128k in size, which is 16k pointers. So with 16384 directory
entries, you should be a long way from having a full hash table, since
each leaf block should contain around 80 entries (again depending on
filename length), so thats not too far off 1m entries.

So for all unstuffed directories with fewer than about 1m entries, I'd
expect to see all accesses resulting in the following I/O pattern:
 1. Look up hash table block
 2. Look up dir leaf block
 3. Look up inode (if this is a ->lookup rather than readdir)

What test are you using to generate the performance figures in this
case?

Steve.





More information about the Linux-cluster mailing list