[Linux-cluster] gfs problem?

Sergey serge at triumvirat.ru
Wed Mar 30 10:46:15 UTC 2005


Hello All.


3 month ago I installed SLM cluster and it works perfect.

Server configuration:
ProLiant DL380G4 Packaged Cluster with MSA500G2
2.4.21-20.ELsmp #1 SMP Wed Aug 18 20:46:40 EDT 2004 i686 i686 i386 GNU/Linux
with
GFS-devel-6.0.0-15
GFS-debuginfo-6.0.0-15
GFS-modules-6.0.0-15
GFS-6.0.0-15
GFS-modules-smp-6.0.0-15

A few days ago server became unstable because of unknown reason.

Running command "top" hanged with this on screen:

 13:33:49  up  2:41,  4 users,  load average: 0.60, 0.42, 0.29
226 processes: 223 sleeping, 3 running, 0 zombie, 0 stopped
Broadcast message from root (pts/4) (Wed Mar 30 13:52:35 2005):t    idle
           total   18.0%    0.0%  170.0%   0.0%     0.8%    0.0%  210.4%
The system is going down for reboot NOW!   0.0%     0.1%    0.0%   44.3%
           cpu01   10.5%    0.0%   41.5%   0.0%     0.5%    0.1%   47.1%
           cpu02    0.7%    0.0%   46.4%   0.0%     0.1%    0.0%   52.5%
           cpu03    5.9%    0.0%   27.4%   0.0%     0.0%    0.0%   66.5%
Mem:  1025412k av,  986552k used,   38860k free,       0k shrd,   38844k buff
                    613244k actv,  122472k in_d,   12764k in_c
Swap: 2097112k av,  184592k used, 1912520k free                  583912k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 2860 root      18   0     0    0     0 RW   82.1  0.0   0:05   3 gulm_Cb_Handler
 2861 root      18   0     0    0     0 RW   82.1  0.0   0:05   2 gulm_Cb_Handler
 3094 mysql     23   0 27664  14M  1988 S    15.5  1.4   5:37   3 mysqld-max
 3465 root      15   0  1136  488   460 S     5.1  0.0   7:38   0 httpd
 .......
 .......
 .......
  
 GFS filesystem is not accessible, lot of commands like top, ps become
 unable to run.
 uptime shows load average 150 and higher.

 In messages log:
 Mar 30 13:43:40 n1 kernel: GFS: fsid=cluster:gfs1.0: stuck in gfs_releasepage()...
 Mar 30 13:43:40 n1 kernel: GFS: fsid=cluster:gfs1.0: blkno = 36343686, bh->b_count = 2
 Mar 30 13:43:40 n1 kernel: GFS: fsid=cluster:gfs1.0: bh->b_journal_head = NULL
 Mar 30 13:43:40 n1 kernel: GFS: fsid=cluster:gfs1.0: stuck in gfs_releasepage()...
 Mar 30 13:43:40 n1 kernel: GFS: fsid=cluster:gfs1.0: blkno = 36343560, bh->b_count = 2
 Mar 30 13:43:40 n1 kernel: GFS: fsid=cluster:gfs1.0: bh->b_journal_head = NULL


 Unmounting GFS filesystems is  failed with error: device is busy


 Wed Mar 30 13:55 reboot


 What is a reason of this problem and how to solve it?

---
Sergey




More information about the Linux-cluster mailing list