[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Why Redhat replace quorum partition/lock lun with new fencing mechanisms?

On 6/15/06, Kevin Anderson <kanderso redhat com> wrote:
On Thu, 2006-06-15 at 02:49 +0800, jOe wrote:
> Hello all,
> Sorry if this is a stupid question.
> I deploy both HP MC/SG linux edition and RHCS for our customers. I
> just wondered why the latest RHCS remove quorum partition/lock lun
> with the new fencing mechanisms(powerswitch,iLO/DRAC, SAN
> switch....)?

First off, I don't think it is completely fair to compare quorum
partitions to fencing.  They really serve different purposes.  Quorum
partition gives you the ability to maintain the cluster through flakey
network spikes.  It will keep you from prematurely removing nodes from
the cluster.  Fencing is really used to provide data integrity of your
shared storage devices.  You really want to make sure that a node is
gone before recovering their data.  Just because a node isn't updating
the quorum partition, doesn't mean it isn't still scrogging your file
systems.  However, a combination of the two provides a pretty solid
cluster in small configurations.  And a quorum disk has another nice
feature that is useful.

That said, a little history before I get to the punch line.  Two
clustering technologies were merged together for RHCS 4.x releases and
the resulting software used the core cluster infrastructure that was
part of the GFS product for both RHCS and RHGFS.  GFS didn't have a
quorum partition as an option primarily due to scalability reasons.  The
quorum disk works fine for a limited number of nodes, but the core
cluster infrastructure needed to be able to scale to large numbers.  The
fencing mechanisms provide the ability to ensure data integrity in that
type of configuration.  So, the quorum disk wasn't carried into the new
cluster infrastructure at that time.

Good news is we realized the deficiency and have added quorum disk
support and it will be part of the RHCS4.4 update release which should
be hitting the RHN beta sites within a few days.  This doesn't replace
the need to have a solid fencing infrastructure in place.  When a node
fails, you still need to ensure that it is gone and won't corrupt the
filesystem.  Quorum disk will still have scalability issues and is
really targeted at small clusters, ie <16 nodes.  This is primarily due
to having multiple machines pounding on the same storage device.  It
also provides an additional feature, the ability to represent a
configurable number of votes.  If you set the quorum device to have the
same number of votes as nodes in the cluster.  You can maintain cluster
sanity down to a single active compute node in the cluster.  We can get
rid of our funky special two node configuration option.  You will then
be able to grow a two node cluster without having to reset.

Sorry I rambled a bit..


Linux-cluster mailing list
Linux-cluster redhat com

Thank you very much Kevin, your information is very useful to us and i've shared it to our engineer team.
Here are two questions still left:
Q1: In a two node cluster config, how does RHCS(v4) handle the heartbeat failed ? (suppose the bonded heartbeat path still failed by some bad situations).
When using quorum disk/lock lun, the quorum will act as a tier breaker and solve the brain-split if heartbeat failed. Currently the GFS will do this ? or other part of RHCS?

Q2: As you mentioned the quorum disk support is added into  RHCS v4.4 update release, so in a two-nodes-cluster config "quorum disk+bonding heartbeat+fencing(powerswitch or iLO/DRAC) (no GFS)" is the recommended config from RedHat? Almost 80% cluster requests from our customers are around two-nodes-cluster(10% is RAC and the left is hpc cluster), We really want to provide our customers a simple and solid cluster config in their production environment, Most customer configure their HA cluster as Active/passive so GFS is not necessary to them and they even don't want GFS exists in their two-nodes-cluster system.

I do think more and more customers will choose RHCS as their cluster solution and we'll push this after completely understand RHCS's technical benefits and advanced mechanisms.

Thanks a lot,


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]