[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] GFS limits?




Ken Preslan wrote:





Our current allocation methods try to allocate from areas of the disk
where there isn't much contention for the allocation bitmap locks.  It
doesn't know anything about spreading load on the basis of disk load.
(That would be an interesting thing to add, but we don't have any plans
to do so for the short term.)



My use case isn't very standard. Rather than needing tons of read/write random access all over the disk, we're almost completely linear write-once-per-file, read-many operations.


We do photo sharing and storage. So lots and lots of photos get uploaded, and they're serially stored on disk. Once they're on disk, though, they're rarely modified. Just read.

It's forseeable in the future, though, to where we can't push these linear writes to disk fast enough as people upload photos. Either the interface (GigE, iSCSI, Fibre Channel) isn't fast enough or whatever. It's way out in the future, but it'll come faster than I like to think about.

In that case, we need a nice way to spread those writes across multiple disks/servers/whatever. GigE bonding might solve it temporarily, but that can only last so far.

Ideally, I want to scale horizontally (tons of cheap linx boxes attached to big disks) and have the writes "passed out" among those boxes. If I have to write my own stuff to do that, fine. But if GFS can potentially provide something along those lines down the road, great.


In the event of some multiple-catastrophe failure (where some data isn't online at all, let alone redundant), how graceful is GFS? Does it "rope off" the data that's not available and still allow full access to the data that is? Or does the whole cluster go down?


Right now, a malfunctioning or non-present disk can cause the whole
cluster to go down.  That's assuming the error isn't masked by hardware
RAID or CLVM mirroing (when we get there).

One of the next projects on my plate is fixing the filesystem so that a
node will gracefully withdraw itself from the cluster when it sees a
malfunctioning storage device.  Each node will stay up and could
potentially be able to continue accessing other GFS filesystems on
other storage devices.

I/We haven't thought much about trying to get GFS to continue to function
when only part of a filesystem is present.


When I'm talking about petabytes, this weighs on my mind heavily. I can't have some power outage take out a couple of nodes which may have both sets of "redundant data" for, say, 10TB, take down a 20PB cluster.


I realize 20PB sounds fairly ridiculous at the moment, but I can see it coming. And it's a management nightmare when it's spread across small 1TB block devices all over the place instead of an aggregate volume. I'm sure it's a software nightmare to think of the aggregate volume, but that's not my problem. :)



I notice the pricing for GFS is $2200. Is that per seat? And if so, what's a "seat"? Each client? Each server with storage participating in the cluster? Both? Some other distinction?


I'm not a marketing/sales person, just a code monkey, so take this with
a grain of salt:  It's per node running the filesystem.  I don't think
machines running GULM lock servers or GNBD block servers count as machine
that need to be paid for.


Looks like I have more reading to do, since apparently I don't totally get what a GNDB block server is. Or a GULM lock server, for that matter.




Is AS a prereq for clients? Servers? Both? Or will ES and WS boxes be able to participate as well?


According to the web page, you should be able to add a GFS entitlement to
all RHEL trimlines (WS, ES, and AS).

http://www.redhat.com/apps/commerce/rha/gfs/


Thanks!


Don

begin:vcard
fn:Don MacAskill
n:MacAskill;Don
org:smugmug.com
adr:;;3347 Shady Spring Lane;Mountain View;CA;94043;USA
email;internet:don smugmug com
title:CEO
tel;fax:(650) 641-3125
x-mozilla-html:FALSE
url:http://www.smugmug.com/
version:2.1
end:vcard


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]