[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] unfencing



On Mon, 2009-02-23 at 12:15 -0600, David Teigland wrote:
> On Mon, Feb 23, 2009 at 07:27:20AM +0100, Fabio M. Di Nitto wrote:
> > > libfence:fence_node_undo(node_name) logic:
> > > 	for each device_name under given node_name,
> > > 	if an unfencedevice exists with name=device_name, then
> > > 	run the unfencedevice agent with first arg of "undo"
> > > 	and other args the normal combination of node and device args
> > > 	(any agent used with unfencing must recognize/support "undo")
> > 
> > All our agents already support on/off enable/disable operations. It's
> > probably best to align them to have the same config options rather than
> > adding a new one across the board.
> 
> Yes, I have those options in mind, and would prefer to use them as well.
> We'll have to wait and see during the implementation phase; for the time being
> they complicate things, so I'm using "undo" to avoid those details.
> 

I know Marek is about to start a "matrix" to map fence agents features
and options. It might be a good thing to talk to him soon'ish. We were
discussing it only a few hours ago.

> The meanings of those fencing structures have never changed since being
> introduced many years ago, and both of those fundamentally change it.  It
> would be very unfortunate to redefine them.

I agree. it's a good point.

> 
> A good alternative to <unfencedevices> would be an <unfence> section within
> the node setions (it would not require a method level)....  Now that I've
> thought more about it, it seems a better choice than "unfencedevices".  It
> defines explicitly what should be done, rather than depending on the implicit
> effects of matching names between fencedevice/unfencedevice.

Agreed.

> 
> <clusternode name="foo" nodeid="3">
> 	<fence>
> 	<method="1">
> 		<device name="san" node="foo"/>
> 	</method>
> 	</fence>
> 
> 	<unfence>
> 		<device name="san" node="foo"/>
> 	</unfence>
> </clusternode>
> 
> <fencedevices>
> 	<fencedevice name="san" agent="fence_scsi"/>
> </fencedevices>
> 
> and
> 
> <clusternode name="bar" nodeid="4">
> 	<fence>
> 	<method="1">
> 		<device name="switch1" port="4"/>
> 		<device name="switch2" port="6"/>
> 	</method>
> 	<method="2">
> 		<device name="apc" port="4"/>
> 	</method>
> 	</fence>
> 
> 	<unfence>
> 		<device name="switch1" port="4"/>
> 		<device name="switch1" port="6"/>
> 	</unfence>
> </clusternode>
> 
> <fencedevices>
>         <fencedevice name="switch1" agent="fence_brocade" ipaddr="1.1.1.1"/>
>         <fencedevice name="switch2" agent="fence_brocade" ipaddr="2.2.2.2"/>
>         <fencedevice name="apc" agent="fence_apc" ipaddr="3.3.3.3"/>
> </fencedevices>
> 
> The key thing I've realized since the previous attempt in 2004, is that we
> need to explicitly configure what unfencing should happen, rather than just
> trying to apply the normal fencing config in reverse.

I think I was trying to apply this same logic and stalled at some point
in the apc+brocade example. With more than one fence agent the amount of
combinations to achieve fencing and then safely unfence node simply
grows exponentially..

Given this last example, a reasonable unfence operation would be to try
to poweron via apc too.

There is no guarantee that it was only method="1" fencing the node and
the node could be powered off.

if we succeed in enabling the switch port, we still don't guarantee that
the node will come back because of lack of power..

How do we protect a node that failed to be fenced, from being unfenced?

Example 2:
both method="1" and method="2" fail to fence node X.
At this point any unfence operation is extremely dangerous.

Fabio


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]