We've had issues with heartbeats being lost but it was always because one node was internally too loaded, and the process sending heartbeats was not getting enough time to run and send those heartbeats.

You must figure out what is your network traffic from the apps you're running and make sure you're under the H/W limit. Sending a heartbeat to the other node should not be a problem if you're under the limit.


We're running a 2 node rhas4 cluster with gfs and fibre attached storage

At the moment we have a pair of bonded 10/100 nics for heartbeat and gfs
locking communication (bonded in active/standby mode)
There are an additional pair of gigabit nics bonded for general network

We've been having some performance issues with GFS which have been
investigating - mainly to do with slow file stat operations like find,
and ls.

a comparison of a data set (700,000 files in a single directory) on gfs
and then on ext3 is below

[root mrapp1 ~]# time ls /free0/partnerimport/data/soap-2/ >/dev/null

real     17m10.035s
user    0m8.220s
sys     0m52.310s

# EXT3
[root mrapp1 ~]# time ls /mr/sig/partnerimport/data/soap-2 > /dev/null

real    0m59.854s
user    0m5.296s
sys     0m0.662s

Can anyone confirm whether using a 10/100 nic for heartbeat would be
having an impact on performance , and whether it would be advisable to
ensure these are gigabit?

