Filesystem access statistics

Wed Apr 12 15:23:25 UTC 2006

Hi,
I subscribed to the list after checking with Steve that it was not an
outlandish choice of places where to ask my questions.

I need to look at a portion of the filesystem namespace and maintain
aggregate statistics on access patterns. In other words, I have a large
filesystem and would like to find out which are the hot spots. I don't
need to keep track of every single file access: since the file count is
in the order of millions, that would swamp the actual I/O, the
analysis and the people looking at the final data. It would make sense
to just group accesses by looking at the top N levels (anything
accessed at levels N+1, N+2, etc. would be coalesced into the parent
directory at level N).

I think that I can't be the only one with such a need. In my case, the
information is going to be used to change the way the tree is going to
be laid out in the future, as well as determining when parts of it can
be made read-only (after an inactivity period). I can also see the
information being useful for selective incremental backups - just look
at the hot spots - or for smarter ordering during a disaster recovery
restore (if you're recovering from random access storage, not tape).
Maybe even locate/slocate/rlocate/mlocate could take advantage of it.

What would be the best approach to this? Inotify doesn't seem to cut it,
because it can't handle recursive watches. I can't afford placing
watches all over the place. Given the sheer number of operations being
tracked, it looks like I'd need some custom code that audits all
file/directory operations, determines if there's a match (I'm only
interested in a specific tree, not everything under /), increments
internal counters and throws the event away. Is there code I could look
at for ideas?

Thanks in advance for any help.

-- 
Rudi