On Wed, 9 Apr 2008, Wendy Cheng wrote:
Have been responding to this email from top of the head, based on folks'
descriptions. Please be aware that they are just rough thoughts and the
responses may not fit in general cases. The above is mostly for the original
problem description where:
1. The system is designated for build-compile - my take is that there are many
temporary and deleted files.
2. The gfs_inode tunable was changed (to 30, instead of default, 15).
I'll take it into account when experimenting with the different settings.
Isn't GFS_GL_HASH_SIZE too small for large amount of glocks? Being too
small it results not only long linked lists but clashing at the same
bucket will block otherwise parallel operations. Wouldn't it help
increasing it from 8k to 65k?
Worth a try.
Now I remember .... we did experiment with different hash sizes when this
latency issue was first reported two years ago. It didn't make much
difference. The cache flushing, on the other hand, was more significant.
What led me to suspect clashing in the hash (or some other lock-creating
issue) was the simple test I made on our five node cluster: on one node I
ran
find /gfs -type f -exec cat {} > /dev/null \;
and on another one just started an editor, naming a non-existent file.
It took multiple seconds while the editor "opened" the file. What else
than creating the lock could delay the process so long?