[Linux-cluster] Kernel panic

Tue Mar 11 18:37:41 UTC 2008

Hi James,

On Tue, 2008-03-11 at 12:45 -0400, James Chamberlain wrote:
> Is there an easy way to determine which filesystem(s) it is?  I have 13.

I don't know of a good way.  You could use the process of elimination
to narrow down your choices I suppose.  You said that one of the nodes
is only serving 5 of the 13 file systems, so if that node has crashed,
that narrows it down to those 5.

> All nodes in this cluster are 64-bit.  Are there any guidelines on how  
> much memory I should have in each node?  Right now, they each have 2 GB.

I don't have specific guidelines, but gfs_fsck takes up a lot of memory
because it keeps all the bitmaps in memory at one time in order to
determine what it needs to do for each block.  It's not bad to run it
anyway because if you don't have enough memory, gfs_fsck will give you
this message:

"This system doesn't have enough memory + swap space to fsck this file
system."

> gfs-utils-0.1.12-1.el5

That version of gfs_fsck has all of the recent code changes in it so
it should hopefully be able to repair any RGs that might be damaged.

> Is there a way I can find out for sure whether it's resource group  
> corruption before I run gfs_fsck?

I can't think of a good way.  I suppose you could test the hardware,
as mentioned in the faq:

http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_corruption

> I have only had this cluster set up since December, and I started  
> having problems with it not long after that.  At first, I was seeing a  
> crash a day, and then I was having maybe one crash a week; however, I  
> had a total of 47 reboots within the cluster yesterday.  I have also  
> been somewhat concerned about the high load average on each node where  
> a service is running.  For example, one node is serving 5 of those 13  
> filesystems.  Its load average is currently and commonly hovering  
> between 35 and 55.  On nodes that aren't running any services, the  
> load average is 0.

If it has been doing this since its inception, that would make me
want to suspect the hardware, but who knows...

Regards,

Bob Peterson
Red Hat GFS