[Linux-cluster] Continuing gfs2 problems: Am I doing something wrong????

Tue Aug 3 19:55:27 UTC 2010

----- "Scooter Morris" <scooter at cgl.ucsf.edu> wrote:
| HI all,
| We continue to have gfs2 crashes and hangs on our production cluster,
| so I'm beginning to think that we've done something really wrong. Here
| is our set-up:
| 
| 
|     • 4 node cluster, only 3 participate in gfs2 filesystems
|     • Running several services on multiple nodes using gfs2:
| 
| 
|         • IMAP (dovecot)
|         • Web (apache with lots of python)
|         • Samba (using ctdb)
|     • GFS2 partitions are multipathed on an HP EVA-based SAN (no LVM)
| -- here is fstab from one node (the three nodes are all the same):

>From the call traces it looks like GFS2 is waiting for internode locking.
This might be a gfs2 hang or just due to lock latency if your app is using
the file system poorly.  The first step is to figure out what it's
waiting for, and for that I recommend a little tool I wrote:

http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/gfs2_hangalyzer.c

It works much better if you set up rsa keys from the machine running the
tool to all the nodes in the cluster, at least temporarily.  That way you
can ssh into all the nodes without typing in the root password.  Otherwise
the tool will ask for your password many many times.

Instructions on how to use the tool are in the comments.

Regards,

Bob Peterson
Red Hat File Systems