[Linux-cluster] GFS: more simple performance numbers

Thu Oct 21 12:06:01 UTC 2004

On Tue, Oct 19, 2004 at 01:05:54PM -0500, Derek Anderson wrote:
> I've rerun the simple performance tests originally run by Daniel McNeil with 
> the addition of the gulm lock manager on the 2.6.8.1 kernel and GFS 6.0 on 
> the 2.4.21-20.EL kernel.
> 
> Notes:
> ======
> Storage:        RAID Array Tornado- Model: F4 V2.0
> HBA:            QLA2310
> Switch:         Brocade Silkworm 3200
> Nodes:          Dual Intel Xeon 2.40Ghz
>                 2GB memory
>                 100Mbs Ethernet
>                 2.6.8.1 Kernel/2.4.21-20.EL Kernel (with gfs 6)
> GuLM:           3-node cluster, 1 external dedicated lock manager
> DLM:            3-node cluster
> LVM:            Not used
> 
> 
> tar xvf linux-2.6.8.1.tar:
> --------------------------
>                         real            user            sys
> gfs dlm 1 node tar      0m19.480s       0m0.474s        0m8.975s

> du -s linux-2.6.8.1 (after untar):
> ----------------------------------
>                         real            user            sys
> gfs dlm 1 node          0m5.149s        0m0.041s        0m1.905s

> Second du -s linux-2.6.8.1:
> ---------------------------
>                         real            user            sys
> gfs dlm 1 node          0m0.341s        0m0.027s        0m0.314s

I've found part of the problem by running the following tests.  (I have
more modest hardware: 256MB memory, Dual Pentium III 700 MHz)

Here's the test I ran on just a single node:

> time tar xf /tmp/linux-2.6.8.1.tar;
  time du -s linux-2.6.8.1/;
  time du -s linux-2.6.8.1/

1. lock_nolock

tar: real    1m6.859s
du1: real    0m45.952s
du2: real    0m1.934s

2. lock_dlm, this is the only node mounted

tar: real    1m20.130s
du1: real    0m52.483s
du2: real    1m4.533s

Notice that the problem is not the first du which looks normal compared to
the nolock results, but the second du is definately bad.

3. lock_dlm, this is the only node mounted
   * changed lock_dlm.h DROP_LOCKS_COUNT from 10,000 to 100,000

tar: real    1m16.028s
du1: real    0m48.636s
du2: real    0m2.332s

No more problem.

Comentary:

When gfs is holding over DROP_LOCKS_COUNT locks (locally), lock_dlm tells
gfs to "drop locks".  When gfs drops locks, it invalidates the cached data
they protect.  du in the linux src tree requires gfs to acquire some
16,000 locks.  Since this exceeded 10,000, lock_dlm was having gfs toss
the cached data from the previous du.  If we raise the limit to 100,000,
there's no "drop locks" callback and everything remains cached.

This "drop locks" callback is a way for the lock manager to throttle
things when it begins reaching its own limitations.  10,000 was picked
pretty arbitrarily because there's no good way for the dlm to know when
it's reaching its limitations.  This is because the main limitation is
free memory on remote nodes.

The dlm can get into a real problem if gfs hold "too many" locks.  If a
gfs node fails, it's likely that some of the locks the dlm mastered on
that node need to be remastered on remaining nodes.  Those remaining nodes
may not have enough memory to remaster all the locks -- the dlm recovery
process eats up all the memory and hangs.

Part of a solution would be to have gfs free a bunch of locks at this
point, but that's not a near-term option.  So, we're left with the
tradeoff:  favoring performance and increasing risk of too little memory
for recovery or v.v.

Given my machines and the test I was running, 10,000 solved the recovery
problem.  256MB is obviously behind the times making a default of 10,000
probably too low.  I'll increase the constant and make it configurable
through /proc.

-- 
Dave Teigland  <teigland at redhat.com>