I currently administer a system running a similar but larger setup, so I may be able to help you.
First, make sure you contact Coraid. They are really good about helping with this stuff.
Second, have you looked at /dev/etherd/err? There is usually a lot of good debugging there.
Third, have you upgraded the firmware in the Coraid and built the newest AoE driver? These are absolutely critical in getting the best performance / reliability and generally the plain kernel driver has fallen behind. They assure me they're working on this and I can vouch for the fact that this driver is essentially the one in the kernel with development necessary to make it work--not some sort of vendor supplied out-of-tree driver.
Finally, make sure you have good switches. I have had a number of switches that drop a packet here and there. These are death to AoE performance. Gigabit is generally a must as well.
On Dec 10, 2006, at 2:03 AM, bigendian+gfs gmail com wrote:
I've just set up a new two-node GFS cluster on a CORAID sr1520 ATA-over-Ethernet. My nodes are each quad dual-core Opteron CPU systems with 32GB RAM each. The CORAID unit exports a 1.6TB block device that I have a GFS file system on.