[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] iLO device as fencing device

Eric Kerin wrote:

Coman Iliut wrote:

We are using it, too with good results. We had to write our own fencing method. The one supplied is too slow. ILO allows you to send a RESET command that is faster.

Also, we wanted a more efficient use of the secure socket (the default fence_ilo lets the socket time out, then reconnects, etc) and we wanted to detect the case when node 1 cannot access the ILO of node 2 because node 2 is not there anymore (powered off, for example) or because node 1 lost network access.

How are you differentiating between node 2 being powered off vs the network cord used for iLO on node 2 being unplugged?

The fence_ilo agent has an optional param that can be included in the cluster.conf file -- 'force="1"' will have the agent immediately power the node to be fenced off, and then check for status...the default action for the agentis to first check status and then begin the fence action...using this parameter reduces fence time using ilo to around 7 seconds. Why is this param action not the default action for the agent? All of the agents employ a similar approach to fencing...first, status is checked to make certain that the node to be fenced is even up, then the fence action is made, then the kill is confirmed, then if the action is a reboot, the system is brought up, and then its status confirmed one final time...any problems along the way are logged. On the ilo, with its unsustainable ssl connection, building up and tearing down the connection 5 times (along with the necessary actions) takes about 40 seconds - yuk...hence the force option which just shoots it first and asks questions later. Most other agents can run through our paranoid multiple status check methodology in just a couple of seconds -- as they use telnet and allow you to keep the connection open between actions.

I wish we could use another connection method for ilo that was faster - say, snmp - but snmp support in ilo is read-only, you cannot power a system down with a mib command. At least, that was the way it was before ilo2. Maybe things have changed for the better with ilo2.

For a deeper description of the 'force' param for ilo, please see:



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]