[Cluster-devel] SCSI persistent reservations

Ryan O'Hara rohara at redhat.com
Fri Jun 16 20:47:45 UTC 2006


I have begun working on the use of SCSI persistent reservations as a
fencing method in Cluster Suite. Included here are some comments,
requirements, and ideas for how this fencing method will/may work. Note
that this is entirely subject to change. Questions and comments are welcome.

I have attached an init script which I have written and used on RHEL4.
The script has a few requirements:

1. You will need to install sg3_utils and libsgutils. I have been using
version 1.20, which is available from http://sg.torque.net/. Note that
the standard sg3_utils RPM that is shipped with RHEL4 seems to be
missing sg_persist, which is the tool for SCSI persistent reservations.

2. The other requirement is the syslinux package, which includes the
'gethostip' tool. The attached script is using this tool to generate
unique keys based on a node's IP address.

3. This works only for LVM2 volumes, and...

4. SCSI persistent reservation works on an entire SCSI device (e.g.
/dev/sdc). This means that if a LUN is partitioned (/dev/sdc1,
/dev/sdc2, ...) and multiple volumes exist on the LUN, SCSI persistent
reservation could (and will) fence all volumes that use partitions on
the LUN. In other words, fencing is done on a per-node basis, so all 
LUNs with the fenced node registered will have their reservations revoked.

The init script is currently quite simple. Its only purpose is to
identify SCSI devices which are members of a clustered volume and
register/unregister those devices. This is done in the "start" section
of the script, where lvs is invoked such that devices and vg_attrs are
returned. The vg_attrs flags will tell us if the volume is clustered
('c' in the 6th position). These devices will then be registered using
the nodes unique key. The device will also be reserved with reservation
type '5' (see man page for sg_persist).

The 'stop' command will do the opposite. Devices that are registered
with our key will release the reservation and unregister the device.

The 'status' command will list the devices (if any) that our node is
registered for. This is extremely basic right now and it would probably
be more useful to list volumes as well (i.e. volumes that are made up of
devices that are using SCSI persistent reservation). More to come with
'status'.

Comments:

LVM provides a way to filter the information returned from lvs/vgs/pvs
based on the volume name. Because the init script uses lvs to retrieve
devices and vg_attrs, the output is dependant the filtering rules.

The way that we generate and store our keys will change in the near
future. The reason is that a node may have multiple NICs and the current
script uses 'uname -n' in conjunction with 'gethostip -x' to generate
the key. This may not give correct information since we can't be sure
which name/NIC is being used for the cluster. So if the output from
'uname -n' does not match the node name there would be problems. This
brings us the the real purpose of SCSI persistent reservation in Cluster
Suite: fencing.

Fencing:

The script to do the fencing is in development. The idea is that a new
fencedevice will be defined for the cluster.conf file, which will also
store the key needed to revoke a node's reservation. The key will need
to be known by all other nodes in the cluster, so it makes sense to keep
this in the cluster.conf file. When a node needs to be fenced, another
node in the cluster can fence another node by knowing its key and simply
revoking its reservation.

Questions or comments?

Ryan



-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: scsi_pr
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20060616/d42ad7b8/attachment.ksh>


More information about the Cluster-devel mailing list