[Linux-cluster] I/O Error management in GFS

Mon May 21 15:12:01 UTC 2007

On Fri, May 18, 2007 at 03:49:14PM +0200, Mathieu Avila wrote:
> Sorry for my late reply,
> 
> I've performed the following tests with cluster-1.03:
> - mount GFS on more than 1 node, using Gulm as the lock manager.
> - cp'ing something big (a kernel) into it on each node,
> - while it does that, manage to have the device returning I/O errors.
> The result is not what you described: sometimes my "cp" finishes with
> I/O errors (that's good), but most of the times it is blocked in the
> kernel. I cannot perform any action, including umount. Syscalls like
> "df" are blocked, too.
> 
> I've done the same test with DLM and got the same results.

Is there anything about "withdraw" in dmesg or /var/log/messages after you
cause the i/o errors?  If not, then the i/o errors are not being reported
back to gfs for some reason.  Perhaps there are some block/scsi drivers
that don't properly return i/o errors to the fs?  Once gfs sees i/o errors
and does the withraw, it should usually work, although it does have
problems occasionally.

Dave