[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Clustering Tutorial



http://www.phy.duke.edu/resources/computing/brahma/Resources/beowulf_book.php is a good start, http://www.beowulf.org is another good place, it is also the home of the original beowulf mailinglist.

Generally I would recommend digging through recent mailinglist postings because
there are often very informed answers to questions.

Lon just answered a fencing question a few days ago:

"STONITH, STOMITH, etc. are indeed implementations of I/O fencing.

Fencing is the act of forcefully preventing a node from being able to
access resources after that node has been evicted from the cluster in an
attempt to avoid corruption.

The canonical example of when it is needed is the live-hang scenario, as
you described:

1. node A hangs with I/Os pending to a shared file system
2. node B and node C decide that node A is dead and recover resources
allocated on node A (including the shared file system)
3. node A resumes normal operation
4. node A completes I/Os to shared file system

At this point, the shared file system is probably corrupt.  If you're
lucky, fsck will fix it -- if you're not, you'll need to restore from
backup.  I/O fencing (STONITH, or whatever we want to call it) prevents
the last step (step 4) from happening.

How fencing is done (power cycling via external switch, SCSI
reservations, FC zoning, integrated methods like IPMI, iLO, manual
intervention, etc.) is unimportant - so long as whatever method is used
can guarantee that step 4 can not complete."

"GFS can use fabric-level fencing - that is, you can tell the iSCSI
server to cut a node off, or ask the fiber-channel switch to disable a
port.  This is in addition to "power-cycle" fencing."


Michael


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]