[Linux-cluster] processes stalled reading gfs filesystem

Frank frank at si.ct.upc.edu
Fri Mar 20 11:20:47 UTC 2009


Hi,
we have a couple of Dell servers with Red Hat 5.2 and OpenVZ, sharing a 
GFS filesystem.

We have noticed that there are a directory which processes stalls when 
try to access it.
For instance look this processes:

[root at parmenides ~]# ps -fel | grep save
4 D root      8997     1  1  78   0 -  1780 339955 09:40 ?        
00:02:31 /usr/sbin/save -s espai.upc.es -g Virtuals -LL -f - -m 
parmenides -t 1236294005 -l 4 -q -W 78 -N /mnt/gfs /mnt/gfs
0 S root     16736 21208  0  78   0 -   980 pipe_w 12:07 pts/1    
00:00:00 grep save
4 D root     18796     1  1  78   0 -  1777 339955 08:46 ?        
00:02:16 /usr/sbin/save -s espai.upc.es -g Virtuals -LL -f - -m 
parmenides -t 1236294005 -l 4 -q -W 78 -N /mnt/gfs /mnt/gfs

Both processes are stalled reading a file:

# lsof -p 8997 | grep gfs
save    8997 root  cwd    DIR   253,7     2048   7022183 
/mnt/gfs/vz/private/109/usr/lib/openoffice/program
save    8997 root    3r   DIR   253,7     3864        26 /mnt/gfs
save    8997 root    6r   DIR   253,7     3864       232 /mnt/gfs/vz
save    8997 root    7r   DIR   253,7     3864       233 /mnt/gfs/vz/private
save    8997 root    8r   DIR   253,7     3864 230761349 
/mnt/gfs/vz/private/109
save    8997 root    9r   DIR   253,7     3864 230773154 
/mnt/gfs/vz/private/109/usr
save    8997 root   12r   DIR   253,7     2048   7003944 
/mnt/gfs/vz/private/109/usr/lib
save    8997 root   14r   DIR   253,7     3864   7022175 
/mnt/gfs/vz/private/109/usr/lib/openoffice

# lsof -p 18796 | grep gfs
save    18796 root  cwd    DIR   253,7     2048   7022183 
/mnt/gfs/vz/private/109/usr/lib/openoffice/program
save    18796 root    3r   DIR   253,7     3864        26 /mnt/gfs
save    18796 root    6r   DIR   253,7     3864       232 /mnt/gfs/vz
save    18796 root    7r   DIR   253,7     3864       233 
/mnt/gfs/vz/private
save    18796 root    8r   DIR   253,7     3864 230761349 
/mnt/gfs/vz/private/109
save    18796 root    9r   DIR   253,7     3864 230773154 
/mnt/gfs/vz/private/109/usr
save    18796 root   12r   DIR   253,7     2048   7003944 
/mnt/gfs/vz/private/109/usr/lib
save    18796 root   14r   DIR   253,7     3864   7022175 
/mnt/gfs/vz/private/109/usr/lib/openoffice

Also there is a process with the glock_ flag accesing the same:

0 D root      8425  6783  0  78   0 -   669 glock_ 08:24 ?        
00:00:00 /usr/lib/openoffice/program/pagein 
-L/usr/lib/openoffice/program @pagein-common

What can be the problem? A corruption in the filesystem?
should a "gfs_fsck" fix the problem?
Regards.

Frank





-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.
For all your IT requirements visit: http://www.transtec.co.uk




More information about the Linux-cluster mailing list