[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Continuing gfs2 problems: Am I doing something wrong????



 Hi Bob,
Yes, we used gfs2_hangalyzer in our last crash (thanks for writing it!). In that case, it pointed to everyone waiting on /usr/local/bin/python2.6. Things came to a complete standstill before we could run hangalyzer this time around, but from the traces it looks like imap is the problem, which doesn't use python.... Can you describe "use the file system poorly" a little? Is there something we should be specifically watching out for?

-- scooter

On 08/03/2010 12:55 PM, Bob Peterson wrote:
----- "Scooter Morris"<scooter cgl ucsf edu>  wrote:
| HI all,
| We continue to have gfs2 crashes and hangs on our production cluster,
| so I'm beginning to think that we've done something really wrong. Here
| is our set-up:
|
|
|     • 4 node cluster, only 3 participate in gfs2 filesystems
|     • Running several services on multiple nodes using gfs2:
|
|
|         • IMAP (dovecot)
|         • Web (apache with lots of python)
|         • Samba (using ctdb)
|     • GFS2 partitions are multipathed on an HP EVA-based SAN (no LVM)
| -- here is fstab from one node (the three nodes are all the same):

> From the call traces it looks like GFS2 is waiting for internode locking.
This might be a gfs2 hang or just due to lock latency if your app is using
the file system poorly.  The first step is to figure out what it's
waiting for, and for that I recommend a little tool I wrote:

http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/gfs2_hangalyzer.c

It works much better if you set up rsa keys from the machine running the
tool to all the nodes in the cluster, at least temporarily.  That way you
can ssh into all the nodes without typing in the root password.  Otherwise
the tool will ask for your password many many times.

Instructions on how to use the tool are in the comments.

Regards,

Bob Peterson
Red Hat File Systems

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]