My thanks to Jayson and especially Wendy for providing so much help with this issue. With a little help from Coraid, I've troubleshot the performance issues down to one of the two ports on the Coraid device. In the end, I was able to move the performance problem from on of my two hosts to the other just by swapping ports. I'll follow up with Coraid to see if I have a hardware problem.I don't know what Coraid recommends, but I usually recommend not plugging devices directly into the ports. I vastly prefer having a good gigabit switch there instead. Here at Engine Yard, we actually have two switches that provide redundancy across either port. The current AoE driver is good enough to use both networks to spread the load if you have two independent network paths, so you also get better performance. We actually have separate cards in each of our servers to prevent failure of an individual network card from being an issue (and AoE should handle this well as long as the driver doesn't crash in this state).
In terms of clustering, having redundant networks is very handy--especially if you use a qdisk.
It's really nice to have such a great level of community support. Wendy, I'd be happy to share the particulars on my deployment once I get things stabilized.I'd be interested too.