[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] DLM tuning



Hi,

On Wed, 2012-01-25 at 16:00 -0500, Digimer wrote:
> Hi all,
> 
>   EL6 DLM question. Beginning adventures in DLM tuning... :)
> 
>   I am trying to optimize DLM for use on nodes which can hit the disks
> hard and fast. Specifically, clustered LVM with many LVs hosting many
> VMs where the VMs are simultaneously disk i/o intensive. I am testing
> though with bonnie++ to induce a high load on a GFS2 partition though.
> 
>   The problem I am seeing is that when one process (bonnie++ running a
> test on one node) hammers DLM, it causes long delays on another node
> trying to, for example, run 'ls -lah'. Is there a way to tweak DLM to
> allow better response from other nodes trying to access the same lock-space?
> 
> Thanks!
> 

The locks which GFS2 uses scale according to the number of inodes in
cache at any one time. The time taken to acquire a DLM lock should not
differ greatly as the number of locks increases, since the main issue is
the network round trip time, and the design of the DLM is such that the
number of messages is minimised. For locally mastered locks, for
example, it should be very small indeed, and in an N node cluster
normally 1/N of the DLM locks will be mastered locally to each node on
average.

You can use the GFS2 tracepoints to see how much locking activity there
is, and how long some internal operations take. In the latest -nmw tree
code, you can also get stats on how long the DLM locks are taking.

I'd be surprised though if the issue that you are describing is related
to the DLM, as it sounds more like just a block I/O issue to me. Once a
lock has been granted it doesn't get dropped unless there is memory
pressure or another node requires it, so for streaming data type
applications GFS2's locking is usually a very small part of the overall
time taken, and it normally only shows up in the "lots of small files"
type workloads, or if there is contention between nodes accessing the
same objects,

Steve.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]