Re: [Linux-cluster] Problem in clvmd/dlm_recoverd

On 19/11/2008, at 2:06 AM, David Teigland wrote:

On Tue, Nov 18, 2008 at 05:14:38PM +1030, Tom Lanyon wrote:
We seem to be having the same problem on a 5 node virtual cluster
where 3 of the nodes share a GFS mount.

A backup script runs on one node which does some heavy reads + writes
to this mount at which point all three nodes jump to 100% cpu (90%
iowait on the machine that is doing the backup, 100% system on the
other two) and all LVM VGs, LVs and GFS mounts lock up.

Which process was using 100% cpu? If it was groupd, fenced, dlm_controld
or gfs_controld, then yes it may be the same problem.

Is there anything that could be tuned here to avoid this issue until a
bug fix is released?

I don't think there's any way to avoid the bug in the bz I referenced.


We haven't been able to catch it quick enough to determine which process is using all CPU.

The other option is that we're just seeing a huge amount of glocks created on the node running backups and all others (webservers) are just hanging whilst trying to access files. I've just done some fairly aggressive tuning of the GFS mounts on all nodes; hopefully this fixes it!


