[Linux-cluster] clvm mirroring target status

Mon Feb 7 21:50:17 UTC 2005

On Sat, Feb 05, 2005 at 10:55:13AM +0100, Filip Sergeys wrote:
> I'll try to explain it a bit more structured
> 
> Host A
> --------
> Disk A 
> Disk Bm (mirrored disk of disk B in host B, unmounted)
> 
> Host B
> --------
> Disk B
> Disk Am (mirrored disk of disk A in host A, unmounted)
> 
> 
> Normal working situation:
> ---------------------------------
> Disk A and Disk B are exported with GNBD. If I understood well, I can combine 
> them into one logical disk for the clusternodes with clvm (striped maybe, 
> don't know, need to read more about it).
> Disk Am and Bm are basically only used as mirroring for A and B. THis is  done 
> with drbd. So they are not taking part in rw actions in any way.
> 
> Host B goes down:
> ------------------------
> Heartbeat says it is down, I cut the power.
> This is what I think needs to be done:
> -Heartbeat moves the virtual IP address of host B to Host A. This is the IP 
> address by which disk B was exported
> -Mount disk Bm read/write. 
> -Export Bm with GNBD. The clusters should now be able to continue working, I 
> think transparently (need to test that to know).

O.k. The way you plan to use drbd makes sense.  The only issue is this: GFS
doesn't use Heartbeat, the cluster manager does its own heartbeating.  If you
have two different heartbeating mechanisms controlling failover, things won't
fail over all at once.  Ideally, for all the stuff below the filesystem layer,
including gnbd, you wouldn't use Heartbeat at all, but simply rely on the
cluster manager.  To do this, you would have to make drbd switch over when the
cluster manager detected a node failure.  This could be done by hacking a
fencing agent, as I mentioned in my previous e-mail. If you must use heartbeat
for the block device failover, you need to recoginze that this could happen
before, during, or after the gfs failover, which may (probably will) cause
problems occasionally.

Unfortunately, I'm not sure that your multipathing setup will work. I am
assuming that you are using pool for the multipathing.  pool multipathing
has two modes, round-robin, and failover. Obviously round-robin (which is where
pool uses all the paths) won't work, because you only have one path available at
once.  However, failover mode probably won't work either, in the setup you
explained. You would need to force pool to use Disk A from host A and Disk
B from host B.  Getting that to work right is probably possible, but not
easy or reliable.  The easiest way to do it is to make host A to have both
disk A and disk B, and make host B have disk Am and Bm. To do this, GNBD import
the disks from host A, assemble the pool, GNBD import the disks from host B,
and use pool_mp to integrate them into the pool. This should automatically
set you up in failover mode, with disks A and B as the primary disks and disks
Am and Bm as the backups. I realize that this means that hostB is usually
sitting idle.

If you name your devices correctly, or import them in a specific order, you
might be able to get pool to use the correct devices in the setup you
described, but I'm not certain.

What your design actually wants is for pool to not do multipathing at all, but
to simply retry on failed IO.  That way, when the virtual IP switches, gnbd
will just automatically pick up the device at its new location. Unfortunately,
pool and gnbd cannot do this.

-Ben

> Concequences:
> -------------------
> Bringing host B back in the game needs a manual intervention. 
> -Basically al services on the cluster nodes need to stop writing. 
> -Sync the disk from Bm to B
> -Give host B back its virtual ip address
> -mount B read/write
> -umount Bm in host A
> -start all services again on the nodes.
> => I know this is not perfect. But we can live with that. This will need to 
> happen after office hours. The thing is that we don't have the budget for 
> shared storage and certainly not for a redundant shared storage solution 
> because most entry level shared storages are SPOFs. 
> 
> I need to find out more about that multipathing. I am not sure how to use it 
> in this configuration. 
> If you have idea's for improvement, they are welcome. 
> 
> Regards,
> 
> Filip
> 
> PS. Thanx for your answer on the clvm mirroring state.
> 
>  
> 
> 
> 
> 
> On Friday 04 February 2005 21:00, Benjamin Marzinski wrote:
> > On Fri, Feb 04, 2005 at 05:52:31PM +0100, Filip Sergeys wrote:
> > > Hi,
> > >
> > > We are going to install a linux cluster with 2 gnbd servers (no SPOF)
> > > and gfs + clvm on the cluster nodes (4 nodes). I have two options, if I
> > > read the docs well, for duplicating data on the gnbd servers:
> > > 1) using clvm target mirroring on the cluster nodes
> > > 2) use drbd underneath to mirror discs. Basically two disks per machine:
> > > 1 live disk which is mirrored with drbd to the second disk in the second
> > > machine and the other way around in the second machine
> > > (so the second disk in the first machine is thus the mirror from the
> > > first (="live") disk in the second machine(sounds complicated, but it is
> > > just hard to write down)).
> > > Both live disks from each machine will be combined as one logical disk
> > > (If I understood well, this is possible).
> > >
> > > Question: what is the status of clvm mirroring? Is it stable?
> > > Suppose it is stable, so I have a choice: which one of the options would
> > > any of you choose? Reason? (Stability, performance, ...)
> >
> > I'm still not sure if cluster mirroring is available for testing (I don't
> > think that it is). It's defintely not considered stable.
> >
> > I'm also sort of unsure about your drbd solution.
> > As far as I know, drbd only allows write access on one node at a time. So,
> > if the first machine uses drbd to write to a local device and one on the
> > second machine, the second machine cannot write to that device. drbd is
> > only useful for active passive setups.  If you are using pool multipathing
> > to multipath between the two gnbd servers, you could set it to failover
> > mode, and modify the fencing agent that you are using to fence the
> > gnbd_server, to make it tell drbd to fail over when you fence the server.
> >
> > I have never tried this, but it seems reasonable. One issue would be how to
> > bring the failed server back up, since the devices are going to be out of
> > sync.
> >
> > http://www.drbd.org/start.html says that drbd still only allows write
> > access to one node at a time.
> >
> > sorry :(
> >
> > -Ben
> >
> > > I found two hits on google concerning clvm mirroring, but both say it is
> > > not finished yet. However the most recent one is from june 2004.
> > > I cannot test either because we have no spare machine. I'm going to buy
> > > two machine so I need to know which disk configuration I will be using.
> > >
> > > Thanks in advance,
> > >
> > > Regards,
> > >
> > > Filip Sergeys
> > >
> > >
> > >
> > > http://64.233.183.104/search?q=cache:r1Icx--aI2YJ:www.spinics.net/lists/g
> > >fs/msg03439.html+clvm+mirroring+gfs&hl=nl&start=12
> > > https://www.redhat.com/archives/linux-cluster/2004-June/msg00028.html
> > >
> > > --
> > > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> > > * System Engineer, Verzekeringen NV *
> > > * www.verzekeringen.be              *
> > > * Oostkaai 23 B-2170 Merksem        *
> > > * 03/6416673 - 0477/340942          *
> > > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > http://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > http://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster