[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] RAIDing a CLVM?



Patton, Matthew F, CTR, OSD-PA&E wrote:
I can't think of a way to combine (C)LVM, GFS, GNBD, and MD (software RAID) and make it work unless just one of the nodes becomes the MD master and then just exports it via NFS. Can it be done? Do commercial options exist to pull off this trick?

Hi,

We're working on the same problem. We have tried two approaches, both with their own fairly serious drawbacks.

Our goal was a 2-node all-in-one HA mega server, providing all office services from one cluster, and with no single point of failure.

The first uses a raid master for each pair. Each member of the pair exports a disk using GNBD. The pair negotiate a master using CMAN, and that master assembles a RAID device using one GNBD import, plus one local disk, and then exports it using NFS, or in the case of GFS being used, exports the assembled raid device via a third GNBD export.

Our trick here was each node exported it's contributory disk, using GNDB, by default, so long as at east one other node was active (quorum > 1), knowing only one master would ever be active. This significantly reduced complexity.

Problems are:
- GNDB instabilities cause frequent locks and crashes, especially busying DLM (suspected). - NFS export scheme also causes locks and hangs to NFS clients on failover *IF* a member of the pair then subsequently imports and an NFS client, as needed in some of our mega-server ideas. - NFS export is not too useful when file locking is important, e.g. subversion, procmail etc (yes, procmail, if your mail server is also your Samba homes server). You have to dell mailproc to use alternative mailbox locking else mailboxes get corrupted. - GFS on assembled device with GNDB export scheme works best, but still causes locks and hangs. Note also an exporting client must NOT import it's own exported GNBD volume, so there is no symmetry between the pair, and it's quite difficult to manage.



Our second approach is something we've just embarked on, and so far is proving more successful, using DRBD. DRBD is used to create a mirrored pair of volumes, a bit like GNBD+MD as above.

The result is a block device accessible from both machines, but the problem is that only one member of the pair is writable (master), and the other is a read-only mount.

If the master server dies, the remaining DRBD becomes the master, and becomes writable. When the dead node recovers, the recovered node becomes a slave, read-only.

The problem is with the read-only aspect, so you still need to have an exporting mechanism for the assembled DRBD volume running on the DRBD master. We plan to do this via GNBD export (GFS FS installed).

That's where the complexity comes in - as the DRBD negotiation appears to be totally independent of cluster control suite, and so we're having to use customizations to start the exporting daemon on the DRBD master.


Conclusions
---

From all we've learned to date, it still seems a dedicated file server or SAN approach is necessary to maintain availability.

Either of the above schemes would work fairly well if we were just building a HA storage component, because most of the complexities we've encountered come about when the shared storage device is used by services on the same cluster nodes.

Most, if not all of what we've done so far is not suitable for a production environment, as it just increases the coupling between nodes, and therefore increases the chance of a cascade failure of the cluster. In all seriousness I believe a single machine with RAID-1 pair has a higher MTBF than any of our experiments.

Many parts of the CCS/GFS suite so far released have serious issues when used in non-standard configurations. For example, exception handling we've encountered usually defaults to "while (1) { retry(); sleep(1); }"

I've read last year about plans for GFS mirroring from RedHat, and haven't found much else since. If anyone knows more I'd love to hear.

It also appears that the guys behind DRBD want to further develop their mirroring so that both volumes can be writable, in which case you can just stick GFS on the assembled device, and run whichever exporting method you like as a normal cluster service.



James

www.daltonfirth.co.uk












[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]