Fwd: problems with large directories?

Tue Mar 9 14:36:42 UTC 2010

Sorry, I meant to send this to the list, not just Ric.

----- Forwarded Message -----
From: "Charles Riley" <criley at erad.com>
To: "Ric Wheeler" <rwheeler at redhat.com>
Sent: Tuesday, March 9, 2010 9:34:25 AM GMT -05:00 US/Canada Eastern
Subject: Re: problems with large directories?

----- "Ric Wheeler" <rwheeler at redhat.com> wrote:

> On 03/08/2010 08:23 PM, Mitch Trachtenberg wrote:
> > Hi,
> >
> > I have an application that deals with 100,000 to 1,000,000 image
> files.
> >
> > I initially structured it to use multiple directories, so that file
> > 123456 would be stored in /12/34/123456.  I'm now wondering if
> that's
> > pointless, as it would simplify things to simply store the file in
> /123456.
> >
> > Can anyone indicate whether I'm gaining anything by using smaller
> > directories in ext3/ext4?  Thanks.
> >
> > Mitch
> >
> 
> I think that breaking up your files into subdirectories makes it
> easier to 
> navigate the tree and find files from a human point of view. Even
> better if the 
> bytes reflect something like year/month/day/hour/min (assuming your
> pathname has 
> a date based guid or similar encoding).
> 
> You can have a million files in one large directory, but be careful to
> iterate 
> and copy them in a sorted order (sorted by inode) to avoid nasty
> performance 
> issues that are side effects of the way we hash file names in ext3/4.
> 
> Good luck!
> 
> Ric
> 

Hi Ric,

Can you elaborate on the performance issues you mention above?

We use rhel4/ext3 on our pacs (medical imaging) servers.
We ran into the 32k limit a couple of years back when our first customer hit the 31,999th study, at which point we implemented a directory hashing algorithm.  Now we store images for a given patient's study in a path something like:
aa/ab/ac/1.2.3/

where 1.2.3 is the dicom study instance uid (a wwuid for a medical study) 
and aa/ab/ac/ is the directory hash we derived from that study instance uid.

The above is a simplified example for illustration purposes only, 1.2.3 does not really hash to aa/ab/ac/.
Within aa/ab/ac/1.2.3/ there can be anywhere from three to a couple of thousand DICOM object files.
Images are initially created in a non-hashed temporary directory and then copied to their permanent home in e.g. aa/ab/ac/1.2.3/

In this context, would we gain filesystem performance by sorting by inode before copying?
Do the performance issues you refer to only apply to the copy process itself or do they contribute to long term filesystem performance?

Thanks for any insight you can provide,

Charles