[Linux-cluster] Problems building a storage server using h/w RAID and GNBD

Mon Nov 19 16:31:58 UTC 2007

I have been experimenting with clustering prior to setting up
a datacentre system providing high availability storage with
automatic failover in the case of individual disk failure or
complete disk server failure.  I am new to Linux clustering,
and am not sure whether problems I am finding are my misuse or bugs.

Intended final setup is two disk servers using h/w RAID6, each
exporting a single large block device using GNBD, with clients
accessing  filesystems on the imported storage using CLVM/GFS.
To provide extra redundancy I am hoping to configure CLVM to
use mirroring to duplicate the logical volumes across the two servers.

Systems are all CentOS-5.

First problem was trying to use Conga.  It seems that ricci doesn't
work on CentOS-5.  I tried the workaround (http://bugs.centos.org/view.php?id=1931)
of replacing the CentOS version of /etc/redhat-release with the RHEL5
version, which did enable me to set up a two-node cluster, but failed
when I tried to configure storage.  So I am using a combination of manual
configuration, system-config-cluster and system-config-lvm.

Second problem was lack of startup scripts for GNBD.  I have
rolled my own, using a /etc/gndb.conf file to specify the exports
and imports, which seems to work fine, but leaves me worried that
GNBD may not be popular enough to be fully supported.

Third problem is trying to struggle with correct startup of a two-node
cluster.  For initial testing I have one disk server and one client
in the cluster.  I accept that the quorum arrangements are difficult
for a two-node cluster, but I was concerned to find that just rebooting
one node with the other remaining up would not work reliably, often
hanging permanently in shutdown (I think it was clurgmgrd hanging - which
is odd as I have no cluster resources/services configured), and frequently
hanging for 5 minutes in startup of CLVM on the disk server (where there
are actually no logical volumes).  This was solved by removing the two-node
option in the cluster config, and giving the disk server a high vote.
This means the client can never be quorate on its own, but that doesn't
matter as its only use of clustering is to import the shared disks.
If this is a sensible solution, I think it would be worth documenting
somewhere.

Fourth problem (and my main concern) is setting up mirroring.  I am
wondering whether this is actually possible in a clustering environment.
The idea is that all filestore partitions will be mirrored over the
two file servers, so if one of the servers fails completely, LVM will
seamlessly switch to using the partitions unmirrored from the remaining
server.  It seems however that the LVM mirroring either needs three
physical devices (the third for keeping the mirror logs) or runs
using a corelog.  If I use the former, I have to find another block device
to export, which is another point of failure, and if I use core logging I
don't see how the mirror log can be maintained cluster-wide.

It does not seem possible to create a corelog mirror using system-config-lvm.
I have tried making a disk log mirror, both with system-config-lvm
and manually with lvcreate, with no luck.  With lvcreate I get locking
errors, due to LV UUID not being recognized - the UUID reported appears
to be the concatenation of two UUIDs.  Rebooting the client seems to
clear this, but then I find that system-config-lvm crashes on startup,
and if I try to manually make a gfs on the mirror it always reports the
device is too small for the journals.  When I try to make a mirror using
system-config-lvm it fails leaving just the disk log LV made.

I'd appreciate any help here - is what I'm trying possible, or is there
a better way to achieve failover in the event of a complete disk server
failure?  Also, are any of my problems (excluding ricci) currently known
bugs, and is it worth trying a build from cvs/svn, or waiting for CentOS-5
updates?

I have deliberately omitted the gory details of the various problems, but I
am happy to provide more detail on request.

-- Cliff