[Linux-cluster] Question :)

Tue May 31 18:50:56 UTC 2005

We're using GFS-6.0.0-7.1
Not the latest patch, I realize, and if that will fix things, that'd be ideal, I think.
We're also in a 5-node situation, with three servers, and in fact our behaviour appears to be almost identical.
It -does- eventually come back for us after I kill the rsync process, so it appears to be flushing a buffer of some sort. 
Regardless, it's not really acceptable behaviour when you've got a 32node compute cluster behind one of the GFS nodes and you have researchers that need to move hundreds of gigs of data into the file system and -can't- because of this.

--
Jerry Gilyeat, RHCE
Systems Administrator
Molecular Microbiology and Immunology
Johns Hopkins Bloomberg School of Public Health

-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of Kovacs, Corey J.
Sent: Tue 5/31/2005 2:38 PM
To: linux clustering
Subject: RE: [Linux-cluster] Question :)

Jerry, is this problem with the "current" supported version of GFS? If so,
what 
version are you running? I am having a similar problem with a 5 node cluster 
with 3 nodes serving as lock managers. If I rsync large ammounts of data
(0.5TB)
to a node serving as a lock manager and mounting the FS, things croak pretty
quick.
If I rsync to a node that is NOT a lock manager, it takes longer but
eventually locks
up their as well. Although at times, it will come back.

when we do out rsync, the gfs_scand and lock_gulmd go crazy. In the instance
where
the fs comes back, they continue to have high cpu utilization. 

I don't think this is "a fact of life" that anyone needs to live with by the
way, there has
to be a reason for this. I can't believe for a minute that you and I are the
only ones
experienceing this.

Corey

________________________________

From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Gerald G. Gilyeat
Sent: Tuesday, May 31, 2005 2:06 PM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] Question :)

First - thanks for the help the last time I poked my pointy little head in
here.
Things have been -much- more stable since we bumped the lock limit to 2097152
;)

However, we're still running into the occasional "glitch" where it seems like
a single process is locking up -all- disk access on us, until it completes
its operation.
Specifically, we see this when folks are doing rsyncs of large amounts of
data (one of my faculty has been trying to copy over a couple thousand 16MB
files). Even piping tar through ssh (from target machine, ssh user at host "cd
/data/dir/path; tar -cpsf -" | tar -xpsf -) results in similar behaviour.
Is this tunable, or simply a fact of life that we're simply going to have to
live with? it only occurs with big, or long, writes. Reads aren't a problem
(it just takes 14 hours to dump 1.5TB to tape...)

Thanks!

--
Jerry Gilyeat, RHCE
Systems Administrator
Molecular Microbiology and Immunology
Johns Hopkins Bloomberg School of Public Health

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 4152 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050531/2172d412/attachment.bin>