[Linux-cluster] GFS/SCSI Lost

isplist at logicore.net isplist at logicore.net
Mon Nov 6 14:39:41 UTC 2006


I posted about this last night but found some additional info. My first post 
was not very useful, showing a paste of SCSI errors after it was disconnected.

I see that something just times out and the storage is lost. I find that I can 
just get on the node, unmount the lost mount, remount and it's back. I also 
notice that the mount is set as non permanent? 

Do I need a keep alive script or is there a configuration somewhere I've 
missed? Here is a snippet from where SCSCI errors started overnight.

Nov  5 21:16:02 qm250 kernel: SCSI error : <0 0 2 1> return code = 0x10000
Nov  5 21:16:02 qm250 kernel: end_request: I/O error, dev sdf, sector 655
Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: fatal: I/O error
Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0:   block = 26
Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0:   function = gfs_dreread
Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0:   file = 
/home/xos/gen/updates-2006-08/xlrpm21122/rpm/BUILD/gfs-kerne
l-2.6.9-58/up/src/gfs/dio.c, line = 576
Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0:   time = 1162782962
Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: about to withdraw from 
the cluster
Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: waiting for outstanding 
I/O
Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: telling LM to withdraw
Nov  5 21:16:05 qm250 kernel: lock_dlm: withdraw abandoned memory
Nov  5 21:16:05 qm250 kernel: GFS: fsid=vgcomp:qm.0: withdrawn






More information about the Linux-cluster mailing list