[Linux-cluster] More GFS2 tuning...

Mon Feb 16 19:09:09 UTC 2009

> When a large multi user,  multi file, multi thread simulation of a
> total file output of 18GB is run, I plot the output of vmstat 1 and
> see a definite pattern with is very periodic. The bo values start at
> around 200MB, then drop down to 0 in most cases for a few seconds,
> then spike to ~700MB/s then eases back down to 200, 150 and back down
> to 0. It looks very much like a cacheing issue to me. These numbers
> are almost identical on the FC switches.

How are your test files distributed across directories, and what is your 
ratio of reads to writes? Are you mounting with 
noatime,nodiratime,noquota? What is your clustering network connection?

If all your files are in the same directory (or a small number of 
subdirectories) and the access is distributed across all the nodes, then 
I have to say that you may well be out of luck and what you are seeing 
is normal. Bouncing directory locks between the nodes on each access 
will introduce enough latency to kill the performance. Also remember 
that no two nodes can have a lock on the same file at the same time, and 
for file creation/deletion, that means a directory lock, which in turn 
means only one file creation/deletion per directory at any one time.

I can well believe the 900MB/s figure if you are just reading back one 
big file from multiple nodes. But the performance will fall off a cliff 
on random I/O involving writes.

Gordan