[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] write's pausing - which tools to debug?

We've been having some problems with doing a write's to our GFS file system, and it will pause, for long periods. (Like from 5 to 10 seconds, to 30 seconds, and occasially 5 minutes) After the pause, it's like nothing happened, whatever the process is, just keeps going happy as can be. Except for these pauses, our GFS is quite zippy, both reads and writes. But these pauses are holding us back from going full production. I need to know what tools I should use to figure out what is causing these pauses.

Here is the setup.
All machines: RHEL 4 update 1 (ok, actually S.L. 4.1), kernel 2.6.9-11.ELsmp, GFS 6.1.0, ccs 1.0.0, gulm 1.0.0, rgmanager 1.9.34

I have no ability to do fencing yet, so I chose to use the gulm locking mechanism. I have it setup so that there are 3 lock servers, for failover. I have tested the failover, and it works quite well.

I have 5 machines in the cluster. 1 isn't connected to the SAN, or using GFS. It is just a failover gulm lock server incase the other two lock servers go down.

So I have 4 machines connected to our SAN and using GFS. 3 are read-only, 1 is read-write. If it is important, the 3 read-only are x86_64, the 1 read-write and the 1 not connected are i386.

The read/write machine is our master lock server. Then one of the read-only is a fallback lock server, as is the machine not using GFS.

Anyway, we're getting these pauses when writting, and I'm having a hard time tracking down where the problem is. I *think* that we can still read from the other machines. But since this comes and goes, I haven't been able to verify that.

Anyway, which tools do you think would be best in diagnosing this?

Many Thanks
Troy Dawson
Troy Dawson  dawson fnal gov  (630)840-6468
Fermilab  ComputingDivision/CSS  CSI Group

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]