[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] GFS + CORAID Performance Problem

Hi Jayson,

I just plugged the hosts directly into the Coraid for troubleshooting purposes.  This is how Coraid setup the machines in their published benchmarks, so I figured it would be safe.  I actually have two Asante IC36524 switches dedicated for Coraid storage with the intention of having redundant paths.  My dual-port PCIe Ethernet cards didn't arrive until yesterday, so I only had a single port on each host to connect to each switch.  This isolated one of the hosts to the bad port on the SR1520.  I should have found this sooner.

On a separate note, I have modified the fence_vixel script to perform fencing on the Asante switches by shutting down the appropriate switch ports.  These Asante switches use what appears to be a cloned Cisco IOS interface, so this script should work for any Ethernet switch that also has the IOS style telnet interface or will at least get you close.  It works on the command-line, but I haven't actually tested it in the cluster through a real fence operation.  I'd be happy to share it if it would be helpful.

How are you fencing your cluster nodes?  I specify the Vixel fence in the configuration GUI since I can't find a way to easily add a custom fence agent.

What is the benefit of using qdisk?


On 12/12/06, Jayson Vantuyl <jvantuyl engineyard com> wrote:
My thanks to Jayson and especially Wendy for providing so much help with this issue.  With a little help from Coraid, I've troubleshot the performance issues down to one of the two ports on the Coraid device.  In the end, I was able to move the performance problem from on of my two hosts to the other just by swapping ports.  I'll follow up with Coraid to see if I have a hardware problem. 
I don't know what Coraid recommends, but I usually recommend not plugging devices directly into the ports.  I vastly prefer having a good gigabit switch there instead.  Here at Engine Yard, we actually have two switches that provide redundancy across either port.  The current AoE driver is good enough to use both networks to spread the load if you have two independent network paths, so you also get better performance.  We actually have separate cards in each of our servers to prevent failure of an individual network card from being an issue (and AoE should handle this well as long as the driver doesn't crash in this state).

In terms of clustering, having redundant networks is very handy--especially if you use a qdisk.

It's really nice to have such a great level of community support.  Wendy, I'd be happy to share the particulars on my deployment once I get things stabilized.
I'd be interested too.

Jayson Vantuyl
Systems Architect
Engine Yard

Linux-cluster mailing list
Linux-cluster redhat com

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]