Re: [Linux-cluster] GFS2 performance on large files

Gordan Bobic wrote:
On Thu, 23 Apr 2009 15:41:45 +0200, Christopher Smith

Don't get me wrong, I'm 100% behind using COTS stuff wherever possible, and setups with DRBD, et al, have worked very well for us in several locations. But there are some situations where it just doesn't (eg: SAN LUNs shared between multiple servers - unless you want to forego the performance benefits of write caching and DIY with multiple machines, DRBD and iscsi-target).

I'm not sure write-caching is that big a deal - your SAN will be caching
all the writes anyway. Granted, the cache will be about 0.05ms further
away than it would be on a local controller, but then again, the
clustering overheads will relegate that into the realm of irrelevance.
I have yet to see a shared-SAN file system that doesn't introduce
performance penalties big enough to make the ping time to SAN a drop
in the ocean.

I was actually thinking of the DIY-SAN scenario. Eg: you get a couple of 2U machines with a few TB of internal disk, mirror them with DRBD, then export the disk as an iSCSI target. Setup something (we used heartbeat) to failover between the two and voila, you have your own redundant iSCSI SAN.

Unfortunately you then can't get the best benefit from the dirt cheap gigabytes of RAM you can stuff into those machines for write caching, since there's no way to synchronise between the two - so if one machines dies there's data loss.

The same applies if you want to DIY an SMB or NFS NAS - either no write caching, or a high risk of data corruption.

(Unless I'm missing something ?)

Just being a devil's advocate. ;)

Me too, to a degree. We have a couple of SANs, primarily to keep higher-ups feeling warm and fuzzy, and I'm not convinced any of them have delivered anything close to proportionally better performance and reliability than something we could have built ourselves.

With that said, I wouldn't want to be the guy who (for example) DIYed an NFS NAS to run crtical Oracle DBs on, when Oracle support comes back with "until you're running on supported storage, we won't help you with your Oracle problems".

