[Linux-cluster] GFS/SCSI Lost
isplist at logicore.net
isplist at logicore.net
Mon Nov 6 14:39:41 UTC 2006
I posted about this last night but found some additional info. My first post
was not very useful, showing a paste of SCSI errors after it was disconnected.
I see that something just times out and the storage is lost. I find that I can
just get on the node, unmount the lost mount, remount and it's back. I also
notice that the mount is set as non permanent?
Do I need a keep alive script or is there a configuration somewhere I've
missed? Here is a snippet from where SCSCI errors started overnight.
Nov 5 21:16:02 qm250 kernel: SCSI error : <0 0 2 1> return code = 0x10000
Nov 5 21:16:02 qm250 kernel: end_request: I/O error, dev sdf, sector 655
Nov 5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: fatal: I/O error
Nov 5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: block = 26
Nov 5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: function = gfs_dreread
Nov 5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: file =
/home/xos/gen/updates-2006-08/xlrpm21122/rpm/BUILD/gfs-kerne
l-2.6.9-58/up/src/gfs/dio.c, line = 576
Nov 5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: time = 1162782962
Nov 5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: about to withdraw from
the cluster
Nov 5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: waiting for outstanding
I/O
Nov 5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: telling LM to withdraw
Nov 5 21:16:05 qm250 kernel: lock_dlm: withdraw abandoned memory
Nov 5 21:16:05 qm250 kernel: GFS: fsid=vgcomp:qm.0: withdrawn
More information about the Linux-cluster
mailing list