[Linux-cluster] dlm and IO speed problem <er, might wanna get a coffee first ; )>

Kadlecsik Jozsef kadlec at sunserv.kfki.hu
Wed Apr 9 19:42:33 UTC 2008


On Wed, 9 Apr 2008, Wendy Cheng wrote:

> Have been responding to this email from top of the head, based on folks'
> descriptions. Please be aware that they are just rough thoughts and the
> responses may not fit in general cases. The above is mostly for the original
> problem description where:
> 
> 1. The system is designated for build-compile - my take is that there are many
> temporary and deleted files.
> 2. The gfs_inode tunable was changed (to 30, instead of default, 15).

I'll take it into account when experimenting with the different settings.

> > > Isn't GFS_GL_HASH_SIZE too small for large amount of glocks? Being too
> > > small it results not only long linked lists but clashing at the same
> > > bucket will block otherwise parallel operations. Wouldn't it help
> > > increasing it from 8k to 65k?
> >
> > Worth a try.
> 
> Now I remember .... we did experiment with different hash sizes when this
> latency issue was first reported two years ago. It didn't make much
> difference. The cache flushing, on the other hand, was more significant.

What led me to suspect clashing in the hash (or some other lock-creating 
issue) was the simple test I made on our five node cluster: on one node I 
ran

find /gfs -type f -exec cat {} > /dev/null \;

and on another one just started an editor, naming a non-existent file.
It took multiple seconds while the editor "opened" the file. What else 
than creating the lock could delay the process so long?

> > However, the issues involved here are more than lock searching time. It also
> > has to do with cache flushing. GFS currently accumulates too much dirty
> > caches. When it starts to flush, it will pause the system for too long.
> > Glock trimming helps - since cache flush is part of glock releasing
> > operation.

But 'flushing when releasing glock' looks as a side effect. I mean, isn't 
there a more direct way to control the flushing?

I can easily be totally wrong, but on the one hand, it's good to keep as 
many locks cached as possible, because lock creation is expensive. But on 
the other hand, trimming locks triggers flushing, which helps to keep the 
systems running more smoothly. So a tunable to control flushing directly 
would be better than just trimming the locks, isn't it. But not knowing 
the deep internals of GFS, my reasoning can of course be bogus.

Best regards,
Jozsef
--
E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
         H-1525 Budapest 114, POB. 49, Hungary




More information about the Linux-cluster mailing list