[Linux-cluster] optimising DLM speed?

Thu Feb 24 22:40:25 UTC 2011

On 02/17/2011 01:29 PM, David Teigland wrote:
> On Thu, Feb 17, 2011 at 09:24:41PM +0000, Alan Brown wrote:
>> David Teigland wrote:
>>> Don't change the buffer size, but I'd increase all the hash table sizes to
>>> 4096 and see if anything changes.
>>>
>>> echo "4096">  /sys/kernel/config/dlm/cluster/rsbtbl_size
>>> echo "4096">  /sys/kernel/config/dlm/cluster/lkbtbl_size
>>> echo "4096">  /sys/kernel/config/dlm/cluster/dirtbl_size
>> Increasing rsbtbl_size to 4096 or higher results in FSes refusing to
>> mount and clvm refusing to start - both with "cannot allocate
>> memory"
>>
>> At 2048, it works, but gfs_controld and dlm_controld exited when I
>> tried to mount all FSes on one node as a test.
>>
>> At 1024 it seems stable.
>>
>> The other settings seemed to have applied OK. So far, reports are
>> positive (but it's quiet at the moment)
>>
>> I've got a strace of clvmd trying to start with rsbtbl_size set to
>> 4096. Should I post it here or would you prefer it mailed direct?
> Thanks for testing, you can post here.
Hi all.  After two tries, we've modified our cluster so that all nodes 
have increased their dlm hash table sizes to 1024.  Initially, I put the 
echos in /etc/init.d/gfs2, but it turns out that /etc/init.d/gfs2 is 
sort of a no-op: /etc/init.d/netfs mounts the gfs2 filesystems before 
/etc/init.d/gfs2 is ever called, so the echos need to be before netfs.

At any rate, we have noticed a significant perceived improvement in 
overall performance of the systems.  Where before, it was common to see 
imap process in D wait -- sometimes hanging for long periods of time -- 
we have not seen that at all since updating the hash table size.  So 
far, so good!

-- scooter