[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] dlm and IO speed problem <er, might wanna get a coffee first ; )>

Kadlecsik Jozsef wrote:
On Wed, 9 Apr 2008, Wendy Cheng wrote:

Have been responding to this email from top of the head, based on folks'
descriptions. Please be aware that they are just rough thoughts and the
responses may not fit in general cases. The above is mostly for the original
problem description where:

1. The system is designated for build-compile - my take is that there are many
temporary and deleted files.
2. The gfs_inode tunable was changed (to 30, instead of default, 15).

I'll take it into account when experimenting with the different settings.

Isn't GFS_GL_HASH_SIZE too small for large amount of glocks? Being too
small it results not only long linked lists but clashing at the same
bucket will block otherwise parallel operations. Wouldn't it help
increasing it from 8k to 65k?
Worth a try.
Now I remember .... we did experiment with different hash sizes when this
latency issue was first reported two years ago. It didn't make much
difference. The cache flushing, on the other hand, was more significant.

What led me to suspect clashing in the hash (or some other lock-creating issue) was the simple test I made on our five node cluster: on one node I ran

find /gfs -type f -exec cat {} > /dev/null \;

and on another one just started an editor, naming a non-existent file.
It took multiple seconds while the editor "opened" the file. What else than creating the lock could delay the process so long?

Not knowing how "find" is implemented, I would guess this is caused by directory locks. Creating a file needs a directory lock. Your exclusive write lock (file create) can't be granted until the "find" releases the directory lock. It doesn't look like a lock query performance issue to me.

However, the issues involved here are more than lock searching time. It also
has to do with cache flushing. GFS currently accumulates too much dirty
caches. When it starts to flush, it will pause the system for too long.
Glock trimming helps - since cache flush is part of glock releasing

But 'flushing when releasing glock' looks as a side effect. I mean, isn't there a more direct way to control the flushing? I can easily be totally wrong, but on the one hand, it's good to keep as many locks cached as possible, because lock creation is expensive. But on the other hand, trimming locks triggers flushing, which helps to keep the systems running more smoothly. So a tunable to control flushing directly would be better than just trimming the locks, isn't it.

To make long story short, I did submit a direct cache flush patch first, instead of this final version of lock trimming patch. Unfortunately, it was *rejected*.

-- Wendy

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]