[Linux-cluster] Re: GFS Errors - cant mount gfs shares

Fri Nov 13 16:26:57 UTC 2009

On Fri, Nov 13, 2009 at 09:13:17AM -0600, Alan A wrote:
> On Thu, Nov 12, 2009 at 5:49 PM, David Teigland <teigland at redhat.com> wrote:
> 
> > On Thu, Nov 12, 2009 at 04:22:17PM -0600, Alan A wrote:
> > > Here are the packages that caused the lockup:
> > >
> > > [root at fenmrdev02 ~]# rpm -qa | grep sg3
> > > sg3_utils-libs-1.25-4.el5
> > > sg3_utils-1.25-4.el5
> > > sg3_utils-devel-1.25-4.el5
> >
> > These packages are unrelated to the gfs_controld errors.
> >
> > > > Nov 12 15:28:20 fenmrdev04 ntpd[3340]: kernel time sync enabled 0001
> > > > Nov 12 15:28:26 fenmrdev04 gfs_controld[2935]: retrieve_plocks: ckpt
> > open
> > > > error 12 a11
> > > > Nov 12 15:28:26 fenmrdev04 gfs_controld[2935]: retrieve_plocks: ckpt
> > open
> > > > error 12 surv34
> > > > Nov 12 15:28:26 fenmrdev04 gfs_controld[2935]: retrieve_plocks: ckpt
> > open
> > > > error 12 account61
> > > > Nov 12 15:28:26 fenmrdev04 gfs_controld[2935]: retrieve_plocks: ckpt
> > open
> > > > error 12 acct63
> > > > Nov 12 15:28:26 fenmrdev04 gfs_controld[2935]: retrieve_plocks: ckpt
> > open
> > > > error 12 gfs_web
> > > > Nov 12 15:28:26 fenmrdev04 gfs_controld[2935]: retrieve_plocks: ckpt
> > open
> > > > error 12 cati_gfs
> > > > Nov 12 15:28:27 fenmrdev04 gfs_controld[2935]: retrieve_plocks: ckpt
> > open
> > > > error 12 gfs_cmdr
> >
> > These may or may not create problems.  To figure out why they happened
> > we'd need to see "group_tool dump gfs" from each of the nodes.
> >
> > Dave
> >
> >
> Here is what I started with and where I am today.
> 
> I had only one node out of three being able to mount GFS (clust has node
> 2-3-4). The other nodes would tell me that /dev/mapper/gfsshare was not a
> block device (node 2 and 4). I worked to see what changed and I found out
> that November 5th update installed sg3_utils on two of the nodes that had
> problem mounting GFS. I also found (I am not sure how this happened) that
> one of the node 4 had service scsi_reserve running. As soon as I removed it,
> a simple reboot allowed me to mount GFS on node4, but node 2 sill had the
> same problem same errors. I tried looking if there is SCSI key reservation
> active on one of the volumes, but no luck, no key was returned on any of the
> GFS volumes.
> 
> Today, something different.....
> I am not sure what is going on but I can't mount GFS on all three nodes. I
> was able to mount it on node2, but then I restarted node3 and everything
> went to hell again.
> 
> Here is the output from gfs_tool dump at the time when GFS was mounted:

The retrieve_plocks errors are a harmless side effect of the failing mount
syscalls, which are returning ENODEV.

Are you using fence_scsi?  I'm guessing not since you didn't have
sg3_utils until now.  As bizarre as it may sound, it seems that
init.d/scsi_reserve may be applying scsi reservations on your devices,
which you don't want of course, and which would explain the mount errors.
I don't know how or why scsi_reserve is running, but you need to disable
it (again assuming you're not using fence_scsi for your cluster.)

Dave