[Linux-cluster] multipath/gfs lockout under heavy write
Marcelo Matus
mmatus at dinha.acms.arizona.edu
Tue Jan 25 08:41:54 UTC 2005
David Teigland wrote:
>On Mon, Jan 24, 2005 at 08:57:28PM +0100, Lazar Obradovic wrote:
>
>
>
>>Since both LVs are a part of same VG (and, thus, are using the same
>>physical device seen over multipath), I'd guess the problem is somewhere
>>inside GFS, but the things that keep confusing me are:
>>
>>- those SCSI errors that look like multipath errors
>>
>>
>
>The SCSI errors appear to be the root problem, not GFS. I don't know what
>multipath might have to do with it.
>
>
>
>>- name 'diapered_dm-2' which I never saw before
>>
>>
>
>In the past, GFS would immediately panic the machine when it saw i/o
>errors. Now it tries to shut down the bad fs instead. After this happens
>you should be able to unmount the offending fs, leave the cluster and
>reboot the machine cleanly.
>
>
I have a question about your last comment. We did the following
experiment with GFS 6.0.2:
1.- Setup a cluster using a unique GFS server and gnbd device (lock_gulm
master
and gnbd_export in the same node).
2.- Fence out a node manually using fence_gnbd.
then we observed two cases:
1.- If the fenced machine is not mounting the GFS/gnbd fs, but only
importing it, then we
can properly either reboot or restart the GFS services with no problem.
2.- If the fenced machine is mounting the GFS/gnbd fs, but with no
process using it,
almost everything produces a kernel panic, even just unmounting the
unused fs.
In fact the only thing that works, besides pushing the reset
button, is 'reboot -f',
which is almost the same.
So, when you say "In the past", do you refer to GFS 6.0.2 ?
>
>
>>- fenced not fencing obviously faulty node
>>
>>
>
>In your situation, the node is running fine wrt the cluster so there's no
>need to fence it. GFS is just shutting down a faulty fs (doing this is
>not always very "clean" and can produce a lot of errors/warnings on the
>console.)
>
>Perhaps we could reinstate an option to have gfs panic immediately when it
>sees i/o errors instead of trying to shut down the problem fs. In this
>case, the panicked node would be "dead" and it would be fenced.
>
>
>
More information about the Linux-cluster
mailing list