[Linux-cluster] GFS hangs, nodes die

Fri Aug 24 11:35:09 UTC 2007

Hi Sebastian,
just to double check. Fencing and everything works as expected, right?

2nd the latest RHEL4 kernel is 2.6.9-55.0.2 (is that also available for 
centos?). If yes you might think about updating. I'm not sure if something 
was updated within dlm/gfs but my tests were done with 2.6.9-55.0.2 and I 
didn't encounter those problems whereas before I had huge amounts of locks 
(~2times the number of files on the fs).

On Friday 24 August 2007 12:37:15 Sebastian Walter wrote:
> Hi list,
>
> just an update. In my scripts, there is nothing about searching the
> whole file system, but I see several "df" processes blocking the system
> with 100 % CPU. I will update firmwares now and check for better QLogic
> drivers. Thanks!
I fear that a firmware update will not change anything but it's always a good 
option ;-) . I also doubt about the Qlogic drivers cause the ones in 2.6.9-55 
are quite ok (did you configure multipathing properly?).
Is that df and everything running concurrently on different nodes?

Last but not least are the "unable to obtain locks" messages the only messages 
that you see when getting problems?

Regards Marc.
>
> Regards,
> Sebastian
>
> Marc Grimme wrote:
> > On Tuesday 21 August 2007 09:52:32 Sebastian Walter wrote:
> >> Hi,
> >>
> >> Marc Grimme wrote:
> >>> Do you also see some messages on the console of the nodes. And the
> >>> gfs_tool
> >>> counters would help before that problem occures. So let it run
> >>> sometimes before to see if locks increase.
> >>> What kind of stress tests are you doing? I bet searching the whole
> >>> filesystem. What makes me wonder is that the gfs_tool glock_purge does
> >>> not work whereas it worked for me with exactly the same problems. Did
> >>> you set it _AFTER_ the fs was mounted?
> >
> > Sorry I mean after is right and before not ;-( .
> > And are you using the latest version of CS/GFS?
> > Do you have a lot of memory in your machines 16G or more?
> >
> >> That makes me optimistic. I set it after the volume was mounted, so I
> >> will give it another try setting it before mounting it. Then I will also
> >> mail myself the output of the counters every 10 minuts. Let's see...
> >
> > I would be interested in the counters.
> > Also add the process list in order to see if how much CPU-Time gfs_scand
> > consumes.
> > i.e.
> > ps axwwww | sort -k4 -n | tail -10
> >
> > Have fun Marc.
> >
> >> ...with best thanks
> >> Sebastian
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Gruss / Regards,

Marc Grimme
Phone: +49-89 452 3538-14
http://www.atix.de/               http://www.open-sharedroot.org/

**
ATIX Informationstechnologie und Consulting AG
Einsteinstr. 10 
85716 Unterschleissheim
Deutschland/Germany

Phone: +49-89 452 3538-0
Fax:   +49-89 990 1766-0

Registergericht: Amtsgericht Muenchen
Registernummer: HRB 168930
USt.-Id.: DE209485962

Vorstand: 
Marc Grimme, Mark Hlawatschek, Thomas Merz (Vors.)

Vorsitzender des Aufsichtsrats:
Dr. Martin Buss