[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] GFS or Web Server Performance issues?



On Wed, 28 Nov 2007, isplist logicore net wrote:

Come on folks, you're making me feel like I should give up or something  :)

From Gordan;

I think a part of the problem is perception.

Perception can only be what marketing says it will do.

And marketing rarely (if ever) reflects what things will really do.

I can't say I have once
seen anything that says it won't scale performance wise by virtue of what it
is, a cluster. I looked at SSI and other types of clusters, this seemed to be
the key for my LAMP based services.

It depends on where the bottleneck on your system is. It stands to reason that if the physical performance of the disk (e.g. a SAN appliance) is fixed, then piling more machines using it, while having to keep locks in sync cannot possibly result in performance magically increasing. Anybody who tells you otherwise is either trying to sell you something, or doesn't understand what they are talking about.

leads to _LOWER_ performance on I/O bound processes. If it's CPU bound,

Sure, there is a performance cost from each node but I would guess it's an
acceptable cost so long as I can work out the I/O side of things. I'm guessing
a lot of folks have come up with all sorts of good ways of handling this
otherwise, no one would be using these tools.

It improves _some_ types of performance, not all. It also depends on how higher levels handle things. If you have partitioned data so that each node handles a subset of it, then you will get improved performance. If all the data is in once place, then the chances are that clustering will cause an overhead, and thus a slowdown if the system is I/O bound in the first place.

then sure, it'll help. But on I/O it'll likely do you harm. It's more
about redundancy and graceful degradation than performance. There's no way
of getting away from the fact that a cluster has to do more work than a
single node, just because it has to keep itself in sync.

When I started learning about the RH cluster suite and GFS, it was because the
hype was that I could build a highly scalable, highly available environment
where I could share data in a way I had not been able to before.

It is scaleable right up to the point where you are I/O (and lock) bound. If you are serving only static web pages, then your performance will likely degrade. If you are using lots of CPU intensive CGI processes, then the performance will most likely increase.

For example, you (surely?) wouldn't expect to dump a MySQL DB on a GFS file system on a cluster, get external locking going, and expect the read/write performance to increase, would you??

SSI and GFS are technologies that have their place, but they are not the right tool for every job. For example, I have an SSI root file system, but I have /var/lib/mysql mounted off local disks, with round-robin replication set up on the nodes, so each is a master and a slave. I wouldn't dream of expecting similar performance if it was running off GFS with external locking.

The only way clustering will give you scaleable performance benefit is
with partitioned (as opposed to shared) data. Shared data clustering is
about convenience and redundancy, not about performance.

I agree but this is a very general statement. In my case, I have a LAMP
application which benefits more from having shared GFS space. I might move to
purely distributed at some point but for now, I'd prefer to find out what I
can do with what I've built so far.

For static data, you could set up an unshared cache space on local storage, and set up squid in accelerator mode (basically outbound rather than inbound cache). This will mean that most of your access hits for static data will never hit Apache or the the GFS storage.

Gordan


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]