[Linux-cluster] Writing new fencing agent

Jonathan Buzzard j.buzzard at dundee.ac.uk
Wed Apr 9 09:06:12 UTC 2008


I am re-purposing an old cluster that used to run RHEL4 and IBM's GPFS.
The nodes are all HP NetServer LP1000r with 2GB RAM, and dual 1.4GHz
PIII's with an additional 1Gbps Intel NIC, and a local 73GB 10k RPM SCSI
disk. I have 48 of these nodes (and a couple spare).

As the GPFS and RedHat licenses have been transferred to new machines,
it is my intention to rebuild the nodes using CentOS 5 and use GFS. I
have a couple TB of iSCSI storage to go with it.

This is a low budget project and I need a fencing device. The nodes all
support something called "Alert on Lan v2", which seems to have been a
fore runner of IPMI. I have a separate "management" network, and have
turned AOL on in the BIOS on each node.

Googling turned up no documentation on how Alert on Lan works so some
time later with Wireshark and the windows client I have some C code that
sends magic packets of death to either power off, reset, or power cycle
(off wait 15 seconds then on) the nodes.

Testing shows that it is robust in that it works on a node that has
kernel panicked and is otherwise totally hung. It is also fast, once
magic packet of death received the node is off instantly. All that seems
to be required on the client side is for the management NIC to be up and
configured with an IP address. This is contrary to the suggestion that
client software is need according to the rather sketchy HP
documentation.

All good so far. However I am not sure what the requirements of a
fencing agent are. Can I rename my program fence_aol2 fiddle with
cluster.conf and it will work? Does the fencing agent have to return
specific exit codes? Should the fencing agent do something to test the
magic packet of death worked or is simply sending it enough? Does the
fencing agent need to be able to turn nodes on (I could use Wake On Lan
for this) as well as off?

Finally once I have a working and debugged AOL2 fencing agent, how does
one go about submitting for inclusion in cluster suite. Alternatively if
this is not wanted (Alert on Lan is a historical protocol and superseded
by IPMI) what is the best way of pointing other users to it's existance?


JAB.

-- 
Jonathan A. Buzzard                      Tel: +441382-386998
Storage Administrator, College of Life Sciences
University of Dundee, DD1 5EH




More information about the Linux-cluster mailing list