[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] scsi reservation issue



Christopher Barry wrote:
On Wed, 2007-10-31 at 10:44 -0500, Ryan O'Hara wrote:
Christopher Barry wrote:
Greetings all,

I have 2 vmware esx servers, each hitting a NetApp over FS, and each
with 3 RHCS cluster nodes trying to mount a gfs volume.

All of the nodes (1,2,& 3) on esx-01 can mount the volume fine, but none
of the nodes in the second esx box can mount the gfs volume at all, and
I get the following error in dmesg:
Are you intentionally trying to use scsi reservations as a fence method?

No. In fact I thought the scsi_reservation service may be *causing* the
issue, and disabled the service from starting on all nodes. Does this
have to be on?

No. You only need to run this service if you plan on using scsi reservations as a fence method. A scsi reservation will restrict access to a device such that only registered nodes can access it. If a reservation exist and a unregistered node tries to access the device, you'll see what you are seeing.

It may be that some reservations were created and never got cleaned-up, which might cause the problem to continue even after the scsi_reserve script was disabled. You can manually run '/etc/init.d/scsi_reserve stop' to attempt to clean up any reservations. Note that I am assuming that any reservations that might still exist on a device were created by the scsi_reserve script. If that is the case, you can see what devices a node is registered for by doing a '/etc/init.d/scsi_reserve status'. Also not that the scsi_reserve script does *not* have to but started or enabled to do these things (ie. you can safely run 'status' or 'stop' without first running 'start').

On caveat... 'scsi_reserve stop' will not unregister a node if it is the reservation holder and other nodes are still registered with a device. You can also use sg_persist command directly to clean all registrations and reservations. Use the -C option. See the sg_persist man page for a better description.

It sounds like the nodes on esx-01 are creating reservations, but the nodes on the second esx box are not registering with the device and therefore are unable to mount the filesystem. Creation of reservations and registrations is handled by the scsi_reserve init script, which should be run at startup on all nodes in the cluster. You can check to see what devices a node is registered for before you mount the filesystem by doing /etc/init.d/scsi_reservce status. If your nodes are not registered with the device and a reservation exists then you won't be able to mount.

Lock_Harness 2.6.9-72.2 (built Apr 24 2007 12:45:38) installed
GFS 2.6.9-72.2 (built Apr 24 2007 12:45:54) installed
GFS: Trying to join cluster "lock_dlm", "kop-sds:gfs_home"
Lock_DLM (built Apr 24 2007 12:45:40) installed
GFS: fsid=kop-sds:gfs_home.2: Joined cluster. Now mounting FS...
GFS: fsid=kop-sds:gfs_home.2: jid=2: Trying to acquire journal lock...
GFS: fsid=kop-sds:gfs_home.2: jid=2: Looking at journal...
GFS: fsid=kop-sds:gfs_home.2: jid=2: Done
scsi2 (0,0,0) : reservation conflict
SCSI error : <2 0 0 0> return code = 0x18
end_request: I/O error, dev sdc, sector 523720263
scsi2 (0,0,0) : reservation conflict
SCSI error : <2 0 0 0> return code = 0x18
end_request: I/O error, dev sdc, sector 523720271
scsi2 (0,0,0) : reservation conflict
SCSI error : <2 0 0 0> return code = 0x18
end_request: I/O error, dev sdc, sector 523720279
GFS: fsid=kop-sds:gfs_home.2: fatal: I/O error
GFS: fsid=kop-sds:gfs_home.2:   block = 65464979
GFS: fsid=kop-sds:gfs_home.2:   function = gfs_logbh_wait
GFS: fsid=kop-sds:gfs_home.2:   file
= /builddir/build/BUILD/gfs-kernel-2.6.9-72/smp/src/gfs/dio.c, line =
923
GFS: fsid=kop-sds:gfs_home.2:   time = 1193838678
GFS: fsid=kop-sds:gfs_home.2: about to withdraw from the cluster
GFS: fsid=kop-sds:gfs_home.2: waiting for outstanding I/O
GFS: fsid=kop-sds:gfs_home.2: telling LM to withdraw
lock_dlm: withdraw abandoned memory
GFS: fsid=kop-sds:gfs_home.2: withdrawn
GFS: fsid=kop-sds:gfs_home.2: can't get resource index inode: -5


Does anyone have a clue as to where I should start looking?


Thanks,
-C

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]