[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] I/O scheduler and performance

Wendy Cheng wrote:

On Wed, 2006-07-05 at 10:06 +0200, Ramon van Alteren wrote:
The annoying problem is that I can't find a way to switch schedulers on
runtime for the gfs based storage (coraids connected with ata over
ethernet) So I suspect that I need to change the default scheduler
compiled in the kernel and reboot, or build all schedulers as modules
and load/unload the modules and retest.

Which version of kernel are you running on ? For RHEL 4 (2.6.9 based),
it is just a matter of specifying boot time parameter and reboot - no
need to recompile kernel and/or modules. Newer versions of community
kernel in kernel.org (say 2.4.17) may have even more flexible methods.
I'm running 2.6.16 kernel with gentoo patches and the latest stable cluster sources build in as modules. I have reconfigured my grub setup so it just takes a reboot to change I/O scheduler.

For local (fixed) disks I can change the scheduler without a reboot by writing to /sys/block/sda/queue/scheduler
Sadly this isn't possible for shared storage so I need a reboot.

This brings up a second question. While researching last night I found
some documents on the net that seem to indicate that gfs uses (or used
to use) directory based locking for writing between the nodes.
E.g. in order to write a file the nodes pass around a directory lock.
However much of the documentation floating around on the internet is
outdated and seems to refer to older versions of gfs.

If your write has something to do with directory (say "create"), then
directory lock is required. Otherwise, the lock obtained is only
associated with the file itself.
OK, thanks.
We write lots of files in the same directory, such locking would have been a pretty disaster

I haven't found any docs describing the locking process with the latest
gfs code and the dlm.

I'm currently seeing a significant drop in throughput between a xfs
filesystem on the shared storage mounted on a single host and a gfs
filesystem on the shared storage mounted on a single host.

I'm getting roughly 75Mb/s throughput on the "normal" fs and 27Mb/s on a
gfs fs.

GFS in general doesn't perform well under bonnie++ due to the extensive
usage of "stat()" system call. This is because bonnie++ doesn't know the
exact file size during the runs so it has to do a "stat()" to retrieve
the size to decide how to allocate its read/write buffer before *each*
read- write. The "stat()" system call happens to be very expensive in

So check out your IO calls - do you really need to do lots of "stat()"
system call ? Otherwise, switching to other benchmarks (such as IOZONE)
and you may find the numbers differ greatly.
OK, I reran the tests with iozone and it shows a difference but not much.
roughly 75Mb/s throughput with a "local" fs and 30Mb/s throughput on gfs.

I still need to do the conncurrent write test (writing over gfs from multiple hosts in the cluster)
And I'm running tests with different schedulers.

Grtz Ramon

To be stupid and selfish and to have good health are the three requirements for happiness, though if stupidity is lacking, the others are useless.

Gustave Flaubert

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]