[Cluster-devel] RFC: generic improvement to fence agents api

Sat Mar 19 06:34:55 UTC 2011

Hi all,

while discussing on linux-cluster the support of the Tripp Lite switched
PDU, it occurred to me that we can effectively improve (almost half) the
time it takes to perform power fencing of certain devices, when for
example, more than one PSU needs to be powered off to complete the action.

Node X has 2 PSU.

In our current state, the config would look like:

<clusternode .....>
 <fence>
  <method...>
   <device name="..." port="1"/>
   <device name="..." port="2"/>
.....

it means effectively spawning, most likely the same agent, twice.
Increasing the time it takes to fence and maybe increasing the
possibility to fail to fence if the second connection fails.

My suggestion would be to allow to specify a list of ports instead.

<clusternode .....>
 <fence>
  <method...>
   <device name="..." ports="1 2"/>
....

Either by using a new keyword "ports" or re-using "port" itself. If
using "port", current configuration will continue to work as-is and the
change effectively would not introduce any backward compatibility issue.

This way the agent can:

1) connect once (reducing in most cases the ssh/telnet/whatever time)
2) issue the OFF command as fast as possible (almost in parallel)
3) then wait for the results.

By adopting a list, the configuration would look cleaner too IMHO.

A quick glance, the change should not affect fenced (David can you
confirm please?), and most agents could handle it via the fencing python
lib (Marek?).

Does it sound reasonable?

Cheers
Fabio