[Linux-cluster] GFS: more simple performance numbers
David Teigland
teigland at redhat.com
Thu Oct 21 12:06:01 UTC 2004
On Tue, Oct 19, 2004 at 01:05:54PM -0500, Derek Anderson wrote:
> I've rerun the simple performance tests originally run by Daniel McNeil with
> the addition of the gulm lock manager on the 2.6.8.1 kernel and GFS 6.0 on
> the 2.4.21-20.EL kernel.
>
> Notes:
> ======
> Storage: RAID Array Tornado- Model: F4 V2.0
> HBA: QLA2310
> Switch: Brocade Silkworm 3200
> Nodes: Dual Intel Xeon 2.40Ghz
> 2GB memory
> 100Mbs Ethernet
> 2.6.8.1 Kernel/2.4.21-20.EL Kernel (with gfs 6)
> GuLM: 3-node cluster, 1 external dedicated lock manager
> DLM: 3-node cluster
> LVM: Not used
>
>
> tar xvf linux-2.6.8.1.tar:
> --------------------------
> real user sys
> gfs dlm 1 node tar 0m19.480s 0m0.474s 0m8.975s
> du -s linux-2.6.8.1 (after untar):
> ----------------------------------
> real user sys
> gfs dlm 1 node 0m5.149s 0m0.041s 0m1.905s
> Second du -s linux-2.6.8.1:
> ---------------------------
> real user sys
> gfs dlm 1 node 0m0.341s 0m0.027s 0m0.314s
I've found part of the problem by running the following tests. (I have
more modest hardware: 256MB memory, Dual Pentium III 700 MHz)
Here's the test I ran on just a single node:
> time tar xf /tmp/linux-2.6.8.1.tar;
time du -s linux-2.6.8.1/;
time du -s linux-2.6.8.1/
1. lock_nolock
tar: real 1m6.859s
du1: real 0m45.952s
du2: real 0m1.934s
2. lock_dlm, this is the only node mounted
tar: real 1m20.130s
du1: real 0m52.483s
du2: real 1m4.533s
Notice that the problem is not the first du which looks normal compared to
the nolock results, but the second du is definately bad.
3. lock_dlm, this is the only node mounted
* changed lock_dlm.h DROP_LOCKS_COUNT from 10,000 to 100,000
tar: real 1m16.028s
du1: real 0m48.636s
du2: real 0m2.332s
No more problem.
Comentary:
When gfs is holding over DROP_LOCKS_COUNT locks (locally), lock_dlm tells
gfs to "drop locks". When gfs drops locks, it invalidates the cached data
they protect. du in the linux src tree requires gfs to acquire some
16,000 locks. Since this exceeded 10,000, lock_dlm was having gfs toss
the cached data from the previous du. If we raise the limit to 100,000,
there's no "drop locks" callback and everything remains cached.
This "drop locks" callback is a way for the lock manager to throttle
things when it begins reaching its own limitations. 10,000 was picked
pretty arbitrarily because there's no good way for the dlm to know when
it's reaching its limitations. This is because the main limitation is
free memory on remote nodes.
The dlm can get into a real problem if gfs hold "too many" locks. If a
gfs node fails, it's likely that some of the locks the dlm mastered on
that node need to be remastered on remaining nodes. Those remaining nodes
may not have enough memory to remaster all the locks -- the dlm recovery
process eats up all the memory and hangs.
Part of a solution would be to have gfs free a bunch of locks at this
point, but that's not a near-term option. So, we're left with the
tradeoff: favoring performance and increasing risk of too little memory
for recovery or v.v.
Given my machines and the test I was running, 10,000 solved the recovery
problem. 256MB is obviously behind the times making a default of 10,000
probably too low. I'll increase the constant and make it configurable
through /proc.
--
Dave Teigland <teigland at redhat.com>
More information about the Linux-cluster
mailing list