[Linux-cluster] GFS over AOE without fencing?

Jayson Vantuyl jvantuyl at engineyard.com
Fri Apr 20 08:28:35 UTC 2007


In principal this is true.

However, cec is not so reliable of a connection.  It is NOT TCP.  I  
have little information about how resilient the protocol is, however,  
in a unit we have with a bad disk, I've had the cec connection  
spontaneously drop mid-command.  I'm sure they're working to fix  
this, but it doesn't bode well for something as critical as fencing.   
I'm also unclear on whether a dropped connection generates a non-zero  
exit code (i.e. is even detectable).

Also, on APCs, the fence_apc script has the benefit that the APC  
switches do not allow more than one concurrent telnet connection,  
which effectively serializes fence requests.  With the cec, not so much.

Also, this fences the entire Coraid device in a way that must be  
manually cleared if it gets left masked.  This is a real possibility  
where multiple nodes are racing to fence each other--especially on  
multiple Coraid shelfs (as it must be done per shelf).

Since we use our Coraids for non-GFS boot volumes as well, this is  
also problematic for us, since a stale mask entry keeps us from booting.

It's really not so simple.  I'd almost recommend shutting off the  
ports at the switch rather than the Coraid, assuming you have good  
enough switches to do this reliably.

On Apr 19, 2007, at 5:47 AM, Bryn M. Reeves wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Kadlecsik Jozsi wrote:
>> On Thu, 19 Apr 2007, Jayson Vantuyl wrote:
>>
>>> It is supposedly possible to script using the mask command to  
>>> block servers on
>>> individual MAC address to the AoE storage.  While this is often  
>>> offered by
>>> Coraid support as an option, I've not seen anyone implement it.   
>>> To be sure, I
>>> have my doubts about it anyways, give that the utility that you  
>>> automate (cec)
>>> does not exactly provide an API (you actually are writing  
>>> directly into the
>>> unit's console!).
>>
>> I have already written the script which does exactly that: i.e it  
>> calls
>> 'cec' and issues the proper mask command to disable/enable access  
>> to the
>> logical blades. Now I only have to test it in our first testbed  
>> GFS setup
>> ;-).
>>
>
> This doesn't seem so unreasonable - the fence_apc script does a  
> similar
> thing, connecting to the console over telnet and squirting commands  
> into
> the device's menu system.
>
> Kind regards,
>
> Bryn.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.7 (GNU/Linux)
> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
>
> iD8DBQFGJ0jI6YSQoMYUY94RAkswAJ9k4Yym/PHs/Bwj9AXz0dXTgPCoJACgnoLQ
> MEJ63uWPdBTdGMo+GYuJtyo=
> =HL8a
> -----END PGP SIGNATURE-----
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



-- 
Jayson Vantuyl
Systems Architect
Engine Yard
jvantuyl at engineyard.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070420/4ecacd00/attachment.htm>


More information about the Linux-cluster mailing list