[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Clustering Tutorial

Thanks.  I should have mentioned that we're doing high performance clustering, and not HA.  We have a beowulf cluster (old and decrepid) and an OSCAR cluster.  None of our current clusters are RH, but that will probably change once we get our next 4-opteron cpu/box cluster... 


And a Big Thanks to everyone who responded.  I now have some good resources.  A lot of reading... yaaaawn !  heh-heh.


On 10/20/05, Tim Spaulding <tspauld98 yahoo com> wrote:
Just a note of caution, there's a big difference between High Availability Clustering and High
Performance Clustering.  AFAIK, Beowulf is an HPC technology.  RHCS (Red Hat Cluster Suite) and
GFS (Global File System) are HAC technologies.  Some of the underlying building blocks are used by
both communities but they are used for fundamentally difference purposes.

http://www.linux-ha.org is the home of another HAC, linux-based technology.  They have more
documentation on clustering and its concepts.  Red Hat does a good job on the HOW-TOs of getting a
cluster working but a terrible job of telling folks the WHY-TOs of clustering.

I'm currently working on a comparison of linux-ha and RHCS so if you have questions regarding HAC
on linux then fire away.  If you have a beowulf cluster, je ne comprends pas, sorry.


--- Michael Will <mwill penguincomputing com> wrote:

> http://www.phy.duke.edu/resources/computing/brahma/Resources/beowulf_book.php
> is a good start,
> http://www.beowulf.org is another good place, it is also the home of the
> original beowulf mailinglist.
> Generally I would recommend digging through recent mailinglist postings
> because
> there are often very informed answers to questions.
> Lon just answered a fencing question a few days ago:
> "STONITH, STOMITH, etc. are indeed implementations of I/O fencing.
> Fencing is the act of forcefully preventing a node from being able to
> access resources after that node has been evicted from the cluster in an
> attempt to avoid corruption.
> The canonical example of when it is needed is the live-hang scenario, as
> you described:
> 1. node A hangs with I/Os pending to a shared file system
> 2. node B and node C decide that node A is dead and recover resources
> allocated on node A (including the shared file system)
> 3. node A resumes normal operation
> 4. node A completes I/Os to shared file system
> At this point, the shared file system is probably corrupt.  If you're
> lucky, fsck will fix it -- if you're not, you'll need to restore from
> backup.  I/O fencing (STONITH, or whatever we want to call it) prevents
> the last step (step 4) from happening.
> How fencing is done (power cycling via external switch, SCSI
> reservations, FC zoning, integrated methods like IPMI, iLO, manual
> intervention, etc.) is unimportant - so long as whatever method is used
> can guarantee that step 4 can not complete."
> "GFS can use fabric-level fencing - that is, you can tell the iSCSI
> server to cut a node off, or ask the fiber-channel switch to disable a
> port.  This is in addition to "power-cycle" fencing."
> Michael
> --
> Linux-cluster mailing list
> Linux-cluster redhat com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Yahoo! Music Unlimited
Access over 1 million songs. Try it free.

Linux-cluster mailing list
Linux-cluster redhat com

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]