On Wed, 03 Mar 2010 11:16:07 -0800, Michael @ Professional Edge LLC wrote
Hail Linux Cluster gurus,
I have researched myself into a corner and am looking for advice. I've
never been a "clustered storage guy", so I apologize for the potentially
naive set of questions. ( I am savvy on most other aspects of networks,
hardware, OS's etc... but not storage systems).
I've been handed ( 2 ) x86-64 boxes w/2 local disks each; and ( 2 )
FC-AL disk shelves w/14 disks each; and told to make a mini NAS/SAN (NFS
required, GFS optional). If I can get this working reliably then there
appear to be about another ( 10 ) FC-AL shelves and a couple of Fiber
Switches laying around that will be handed to me.
NFS filesystems will be mounted by several (less than 6) linux machines,
and a few (less than 4) windows machines [[ microsoft nfs client ]] -
all more or less doing web server type activities (so lots of reads from
a shared filesystem - log files not on NFS so no issue with high IO
writes). I'm locked into NFS v3 for various reasons. Optionally the
linux machines can be clustered and GFS'd instead - but I would still
need to come up with a solution for the windows machines - so a NAS
solution is still required even if I do GFS to the linux boxes.
Active / Passive on the NFS is fine.
Why not start NFS/Samba on both machines with only the IP floating between
* Each of the ( 2 ) x86-64 machines have a Qlogic dual HBA 1 fiber
direct connected to each shelf (no fiber switches yet - but will have
them later if I can make this all work); I've loaded RHEL 5.4 x86-64.
* Each of the ( 2 ) RHEL 5.4 boxes - used the 2 local disks w/onboard
fake raid1 = /dev/sda - basic install so /boot and LVM for the rest -
nothing special here (didn't do mdadm basically for simplicity of /dev/sda)
* Each of the ( 2 ) RHEL 5.4 boxes can see all the disks on both shelves
- and since I don't have Fiber Switches yet - at the moment there is
only 1 path to each disk; however as I assume I will figure out a method
to make this work - I have enabled multipath - and therefore I have
consistent names to 28 disks.
Here's my dilemma. How do I best add Redundancy to the Disks, removing
as many single points of failure, and preserving as much diskspace as
My initial thought was - to take "shelf1:disk1 and shelf2:disk1" and put
them into a software raid1 - mdadm; then put the resulting /dev/md0 into
a LVM. When I need more diskspace, I just then create "shelf1:disk2 and
shelf2:disk2" as another software raid1 then just add the new "/dev/md1"
into the LVM and expand the FS. This handles a couple things in my mind:
1. Each shelf is really a FC-AL so it's possible that a single disk
going nuts could flood the FC-AL and all the disks in that shelf go poof
until the controller can figure itself out and/or the bad disk is removed.
2. Efficient I am retaining 50% storage capacity after redundancy - if I
can do the "shelf1:disk1 + shelf2:disk2" mirrors; plus all bandwidth
used is spread across the 2 HBA fibers and nothing goes over the TCP
network. Conversely DRBD doesn't excite me much - as I then have to do
both raid in the shelf (probably still with MDADM) and then I add TCP
(ethernet) based RAID1 between the nodes - and when all is said and done
- I only the have 25% of storage capacity still available after redundancy.
3. I easy to add more diskspace - as each new mirror (software raid1)
can just be added to an existing LVM.
You may create RAID1 (between the two shelfs) over RAID6 (on the disks from
the same shelf), so you will loose only 2 more disks per shelf or about 40%
storage space left, but more stable and faster. Or several RAID6 arrays with
2+2 disks from each shelf - again 50% storage space, but better performance
with the same chance for data loss like with several RAID1 ... the resulting
mdX you may add to LVM and use the logical volumes
From what I can find messing with Luci (Conga) though... is - I don't
see any resource scripts listed for - "mdadm" (on RHEL 5.4) - so would
my idea even work (I have found some posts asking for a mdadm resource
script but I've seen no response)? I also see with RHEL 5.3 LVM has
mirrors that can be clustered now - is this the right answer? I've done
a ton of reading but everything I've dug up so far; assumes that the
fiber devices are being presented by a SAN that is doing the redundancy
before the RHEL box sees the disk... or... there are a ton of examples
of where fiber is not in the picture and there are a bunch of locally
attached hosts presenting storage onto the TCP (ethernet) - but I've not
found nearly anything on my situation...
So... here I am... :-) I really just have 2 nodes - who can both see -
a bunch of disks (JBOD) and I want to present them to multiple hosts via
NFS (required) or GFS (to linux boxes only).
if the Windows and Linux data are different volumes it is better to leave the
GFS partition(s) available only via iSCSI to the linux nodes participating in
the cluster and not to mount it/them locally for the NFS/Samba shares, but if
the data should be the same you may go even Active/Active with GFS over iSCSI
[over CLVM and/or] [over DRBD] over RAID and use NFS/Samba over GFS as a
service in the cluster. It all depends on how the data will be used from the
All ideas - are greatly appreciated!
Linux-cluster mailing list
Linux-cluster redhat com