[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] multipath/gfs lockout under heavy write

David Teigland wrote:

On Mon, Jan 24, 2005 at 08:57:28PM +0100, Lazar Obradovic wrote:

Since both LVs are a part of same VG (and, thus, are using the same
physical device seen over multipath), I'd guess the problem is somewhere
inside GFS, but the things that keep confusing me are:

- those SCSI errors that look like multipath errors

The SCSI errors appear to be the root problem, not GFS. I don't know what multipath might have to do with it.

- name 'diapered_dm-2' which I never saw before

In the past, GFS would immediately panic the machine when it saw i/o
errors. Now it tries to shut down the bad fs instead. After this happens
you should be able to unmount the offending fs, leave the cluster and
reboot the machine cleanly.

I have a question about your last comment. We did the following experiment with GFS 6.0.2:

1.- Setup a cluster using a unique GFS server and gnbd device (lock_gulm master
and gnbd_export in the same node).

2.- Fence out a node manually using fence_gnbd.

then we observed two cases:

1.- If the fenced machine is not mounting the GFS/gnbd fs, but only importing it, then we
can properly either reboot or restart the GFS services with no problem.

2.- If the fenced machine is mounting the GFS/gnbd fs, but with no process using it,
almost everything produces a kernel panic, even just unmounting the unused fs.
In fact the only thing that works, besides pushing the reset button, is 'reboot -f',
which is almost the same.

So, when you say "In the past", do you refer to GFS 6.0.2 ?

- fenced not fencing obviously faulty node

In your situation, the node is running fine wrt the cluster so there's no need to fence it. GFS is just shutting down a faulty fs (doing this is not always very "clean" and can produce a lot of errors/warnings on the console.)

Perhaps we could reinstate an option to have gfs panic immediately when it
sees i/o errors instead of trying to shut down the problem fs.  In this
case, the panicked node would be "dead" and it would be fenced.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]