[Date Prev][Date Next] [Thread Prev][Thread Next]
Re: [Linux-cluster] GFS (1 & partially 2) performance problems
- From: Michael Lackner <michael lackner mu-leoben at>
- To: linux clustering <linux-cluster redhat com>
- Subject: Re: [Linux-cluster] GFS (1 & partially 2) performance problems
- Date: Tue, 15 Jun 2010 14:04:09 +0200
I tried to do R/W tests comparing 4kB blocksize to 1MB blocksize now,
difference in performance was negligible. Also, GFS2 was almost on the same
speed level when compared to GFS1 for Reads (see below why..). I/O
is "cfq" by the way. I never really cared about the I/O scheduler since
I do not yet
understand the differences between the available ones anyway.
But, I found out something else. As suggested by Steven in his reply, I
both on the GFS1/2 filesystems, and also on the raw blockdevice, and
the results were almost the same!
So: GFS1 as well as GFS2 3-Node concurrent, sequential Reads showed a total
of 40MB/s (GFS1) and 45MB/s (GFS2) using a blocksize of 1MB. For single-node
sequential read the performance went up to a nice 180-190MB/s for both FS
Now, the surprising part: Doing a dd read on the raw blockdevice with 3
showed a total of only ~60MB/s!! Almost as low as reading from GFS1/2 with
multiple nodes at the same time!! When reading the raw blockdevice on a
node, I got slightly over 190MB/s again.
So, this concurrent read issue seems not to be a GFS1 or GFS2 problem, but
more a problem of the underlying storage. This is extremely surprising
bit shocking I must say.
I guess for the Reads I will need to check the SAN itself, see if I can
optimization on it.. That thing can't possibly be that bad when it
comes to reading..
Thanks a lot for your ideas so far!
Jankowski, Chris wrote:
For comparison, could you do your dd(1) tests with a very large block size (1 MB) and tell us the results, please?
I have a vague hunch that the problem may have something to do with coalescing or not of IO operations.
Also, which IO scheduler are you using?
Thanks abnd regards,
From: linux-cluster-bounces redhat com [mailto:linux-cluster-bounces redhat com] On Behalf Of Michael Lackner
Sent: Tuesday, 15 June 2010 00:22
To: linux clustering
Subject: Re: [Linux-cluster] GFS (1 & partially 2) performance problems
Thanks for your reply. I unfortunately forgot to mention, HOW I was actually testing, stupid.
I tested with dd, doing 4kB blocksize reads and writes, 160GB total testfile size per node.
I read from /dev/zero for writing tests and wrote to /dev/null for reading tests. So, totally sequential, somewhat small blocksize (equal to filesystem BS).
The performance was measured directly on the Fibrechannel Switch, which offers nice per-port monitoring for that purpose.
I have yet to do some serious read testing on GFS2. I have aborted my
GFS2 tests as
write performance was not up to GFS1 to begin with. My older GFS2 benchmarks (i did this with a 2-node configuration before) are lost, I will need to re-do them to give you some numbers.
After each write test I did a "sync" to flush everything to disks. I did not do this before or after read tests though..
As you mentioned Journal Size, "gfs_tool counters <mountpoint>" said, that only 2-3% logspace were in use after the tests (I guess this is the per-node fs journal?).
As for the direct I/O tests, by that you mean testing without ANY caching going on, a synchronous write? What I did before was test EXT3 (~190MB/s) and XFS
on the Storage Array. I think what I'm getting here is raw throughput, since I am not monitoring in the OS, but at the Fibrechannel Switch itself..
I will do GFS2 read tests similiar to those conducted for GFS1. I'll be able to do that tomorrow morning, then I can post the numbers here.
Steven Whitehouse wrote:
On Mon, 2010-06-14 at 14:00 +0200, Michael Lackner wrote:
What tests are you running? GFS2 is generally faster than GFS1 except
for streaming writes, which is an area that we are putting some effort
into solving currently. Small writes (one fs block (4k default) or
less) on GFS2 are much faster than on GFS1.
I am currently building a Cluster sitting on CentOS 5 for GFS usage.
At the moment, the storage subsystem consists of an HP MSA2312
Fibrechannel SAN linked to an FC 8gbit switch. Three client machines
are connected to that switch over 8gbit FC. The disks themselves are
12 * 15.000rpm SAS configured in RAID-5 with two hotspares.
Now, the whole storage shall be shared (single filesystem), here GFS
The Cluster is only 3 nodes large at the moment, more nodes will be
added later on. I am currently testing GFS1 and GFS2 for performance.
Lock Management is done over single 1Gbit Ethernet Links (1 per
Thing is, with GFS1 I get far better performance than with the newer
GFS2 across the board, with a few tunable parameters set, for writes
GFS1 is roughly twice as fast.
But, concurrent reads are totally abysmal. The total write
performance (all nodes combined) sits around 280-330Mbyte/sec,
whereas the READ performance is as low as 30-40Mbyte/sec when doing
concurrent reads. Surprisingly, single-node read is somewhat ok at
180Mbyte/sec, but as soon as several nodes are reading from GFS
(version 1 at the
Reads on GFS2 should be much faster than GFS1, so it sounds as if
something isn't working correctly for some reason. For cached data,
reads on GFS2 should be as fast as ext2/3 since the code path is
identical (to the page cache) and only changes if pages are not cached.
GFS1 does its locking at a higher level, so there will be more
overhead for cached reads in general.
moment) at the same time, things turn ugly.
Do make sure that if you are preparing the test files for reading all
from one node (or even just a different node to that on which you sre
running the read tests) that you need to sync them to disk on that
node before starting the tests to avoid issues with caching.
This is strange, because for writes, global performance across the
cluster increases slightly when adding more nodes. But for reads, the
oppsite seems to be true.
For read and write tests, separate testfiles were created and read
for each node, with each testfile sitting in its own subdirectory, so
no node would access another nodes file.
That sounds like a good test set up to me.
You shouldn't normally need to set the glock_purge and demote_secs to
anything other than the default. These settings no longer exist in
GFS2 since it makes use of the shrinker subsystem provided by the VM
and is auto-tuning. If your workload is metadata heavy, you could try
boosting the journal size and/or the incore_log_blocks tunable.
GFS1 created with the following mkfs.gfs parameters:
"-b 4096 -J 128 -j 16 -r 2048 -p lock_dlm"
(4kB blocksite, 16 * 128MB journals, 2GB resource groups, Distributed
Mount Options set: "noatime,nodiratime,noquota"
Tunables set: "glock_purge 50, statfs_slots 128, statfs_fast 1,
Can you try doing some I/O direct to the block device so that we can
get an idea of what the raw device can manage? Using dd both read and
write, across the nodes (different disk locations on each node to
simulate different files).
Also, in /etc/cluster/cluster.conf, I added this:
<dlm plock_ownership="1" plock_rate_limit="0"/> <gfs_controld
Any ideas on how to figure out what's going wrong, and how to tune
GFS1 for better concurrent read performance, or tune GFS2 in general
to be competitive/better than GFS1?
I'm dreaming about 300MB/sec read, 300MB/sec write sequentially and
somewhat good reaction times while under heavy sequential and/or
random load. But for now, I just wanna get the seq reading to work
Thanks a lot for your help!
I'm wondering if the problem might be due to the seek pattern
generated by the multiple read locations,
Chair of Information Technology, University of Leoben
michael lackner mu-leoben at | +43 (0)3842/402-1505
[Date Prev][Date Next] [Thread Prev][Thread Next]