[Linux-cluster] More GFS2 tuning...

Fellow cluster folks,

I am currently trying to get as much throughput as I can for a NFS
cluster I am about to put into production but the numbers I am getting
for throughput, like others have said, are dismal. My setup consists
of 5 DL360-G5's w/8GB ram running RHEL5.3 x86_64 with dual 4G FC
Qlogic cards, connected via 4G FC switches to an EVA8100, with 48
spindles in the diskgroup. The luns are between 500 and 750GB and I am
using device-mapper multipath in round-robin with a rr_min_io of 250
and multibus. I've even adjusted the qlogic drivers to have a q depth
of 64.

By my reckoning, I should be able to see 400MB or more sustained
throughput using this setup. If this is a pipe dream, someone let me
know quick before I go nutz.

When a large multi user,  multi file, multi thread simulation of a
total file output of 18GB is run, I plot the output of vmstat 1 and
see a definite pattern with is very periodic. The bo values start at
around 200MB, then drop down to 0 in most cases for a few seconds,
then spike to ~700MB/s then eases back down to 200, 150 and back down
to 0. It looks very much like a cacheing issue to me. These numbers
are almost identical on the FC switches.

I'd like to level it out a bit so that the average climbs up for a
best general usage profile. This is going to be as mentioned above a
NFS server exporting 1 export per node serving roughly 250 machines.

I've read that GFS2 is supposed to be "self tuning" but I don't think
these are necessarily GFS2 issues.

I was even told by a redhat engineer at last years summit that I could
expect to see up to 600-900 MB/s. Not sure I believe that one, but 400
seems doable.

Anyone have something similar? What I/O rates are people getting?

Might be useful to have use cases and configs on a wiki somewhere to
let people compare results etc.

Anyway, all help is welcome and I am willing to test near anything as
long as I won't get arrested for it.


