RE: [Linux-cluster] Question :)

Jerry, is this problem with the "current" supported version of GFS? If so, what
version are you running? I am having a similar problem with a 5 node cluster
with 3 nodes serving as lock managers. If I rsync large ammounts of data (0.5TB)
to a node serving as a lock manager and mounting the FS, things croak pretty quick.
If I rsync to a node that is NOT a lock manager, it takes longer but eventually locks
up their as well. Although at times, it will come back.
when we do out rsync, the gfs_scand and lock_gulmd go crazy. In the instance where
the fs comes back, they continue to have high cpu utilization.
I don't think this is "a fact of life" that anyone needs to live with by the way, there has
to be a reason for this. I can't believe for a minute that you and I are the only ones
experienceing this.

From: linux-cluster-bounces redhat com [mailto:linux-cluster-bounces redhat com] On Behalf Of Gerald G. Gilyeat
Sent: Tuesday, May 31, 2005 2:06 PM
To: linux-cluster redhat com
Subject: [Linux-cluster] Question :)

First - thanks for the help the last time I poked my pointy little head in here.
Things have been -much- more stable since we bumped the lock limit to 2097152 ;)

However, we're still running into the occasional "glitch" where it seems like a single process is locking up -all- disk access on us, until it completes its operation.
Specifically, we see this when folks are doing rsyncs of large amounts of data (one of my faculty has been trying to copy over a couple thousand 16MB files). Even piping tar through ssh (from target machine, ssh user host "cd /data/dir/path; tar -cpsf -" | tar -xpsf -) results in similar behaviour.
Is this tunable, or simply a fact of life that we're simply going to have to live with? it only occurs with big, or long, writes. Reads aren't a problem (it just takes 14 hours to dump 1.5TB to tape...)


Jerry Gilyeat, RHCE
Systems Administrator
Molecular Microbiology and Immunology
Johns Hopkins Bloomberg School of Public Health

