[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] dlm and IO speed problem <er, might wanna get a coffee first ; )>



On Thu, 10 Apr 2008, Kadlecsik Jozsef wrote:

> But this is a good clue to what might bite us most! Our GFS cluster is an 
> almost mail-only cluster for users with Maildir. When the users experience 
> temporary hangups for several seconds (even when writing a new mail), it 
> might be due to the concurrent scanning for a new mail on one node by the 
> MUA and the delivery to the Maildir in another node by the MTA.
> 
> What is really strange (and distrurbing) that such "hangups" can take 
> 10-20 seconds which is just too much for the users.

Yesterday we started to monitor the number of locks/held locks on two of 
the machines. The results from the first day can be found at 
http://www.kfki.hu/~kadlec/gfs/.

It looks as Maildir is definitely a wrong choice for GFS and we should 
consider to convert to mailbox format: at least I cannot explain the 
spikes in another way.
 
> In order to look at the possible tuning options and the side effects, I 
> list what I have learned so far:
> 
> - Increasing glock_purge (percent, default 0) helps to trim back the 
>   unused glocks by gfs_scand itself. Otherwise glocks can accumulate and 
>   gfs_scand eats more and more time at scanning the larger and 
>   larger table of glocks.
> - gfs_scand wakes up every scand_secs (default 5s) to scan the glocks,  
>   looking for work to do. By increasing scand_secs one can lessen the load 
>   produced by gfs_scand, but it'll hurt because flushing data can be 
>   delayed.
> - Decreasing demote_secs (seconds, default 300) helps to flush cached data
>   more often by moving write locks into less restricted states. Flushing 
>   often helps to avoid burstiness *and* to prolong another nodes' 
>   lock access. Question is, what are the side effects of small
>   demote_secs values? (Probably there is no much point to choose
>   smaller demote_secs value than scand_secs.)
> 
> Currently we are running with 'glock_purge = 20' and 'demote_secs = 30'.

Best regards,
Jozsef
--
E-mail : kadlec mail kfki hu, kadlec blackhole kfki hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
         H-1525 Budapest 114, POB. 49, Hungary


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]