[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] no version for "gfs2_unmount_lockproto"



On Wed, 2008-02-13 at 09:23 +0100, Ferenc Wagner wrote:
> Thanks!  This patch indeed fixed the hang.  But of course not the
> mount:
> 
> Trying to join cluster "lock_dlm", "pilot:test"
> Joined cluster. Now mounting FS...
> GFS: fsid=pilot:test.4294967295: can't mount journal #4294967295
> GFS: fsid=pilot:test.4294967295: there are only 6 journals (0 - 5)

Hi Ferenc,

The "4294967295" is really a -1 which is a bad return code on the
mount.  So it should be a process of elimination to find out what
went wrong.  Several possibilities of what can be going wrong come
to mind:

1. Is it possible that your file system has a different cluster
   name ("pilot") from the the cluster name in your cluster.conf file?
2. Perhaps there is another gfs file system with the same name "test"
   already mounted?
3. Perhaps it can't find the locking protocol, lock_dlm (I hope)?
   Make sure lock_dlm shows up in lsmod.
4. Perhaps gfs can't find the rest of the cluster infrastructure?
   Check to make sure you did "service cman start" and have
   aisexec running on the system having the problem.  Also, check
   /var/log/messages for messages pertaining to cluster problems.

It sounds to me like we should have a better error message for
whatever went wrong.  Let's figure that out first and then we can
go about improving the error messages with a bugzilla if needed.
We have improved the error messages considerably from earlier.
I don't know what version of the gfs2-utils you have, but that
will contain the common mount helper (/sbin/mount.gfs2 is a hard
link to /sbin/mount.gfs) that does some of this error processing
when mounts fail.  So a newer version of the mount helper may be
better at pointing out what it doesn't like about your file system.

> # gfs_tool jindex /dev/mapper/gfs-test 
> gfs_tool: /dev/mapper/gfs-test is not a GFS file/filesystem
> 
> Scary.  What may be the problem?  The other node is using this
> volume...  Can even unmount/remount it.  Though in dmesg it says:

I wouldn't call it scary at all.  It sounds like gfs_tool may be
somewhat confused about the mount point.  Try using the mount
point that was used on the mount command, not the /dev/mapper
mount point and see if that helps.  I've actually been
working on making a better version of that code too--both kernel
and userland--that improves how gfs_tool finds mount points.
For RHEL5, they're bugzillas 431951 (gfs_tool) and 431952 (kernel)
respectively.  Those changes have not been shipped yet, due to
code freeze, but patches are in the bugzilla records.

As for all the kernel dmesgs you noted, that's perfectly normal.
When you mount a gfs file system, it runs through all the journals
regardless, checking if they are clean or need to be replayed,
so that's all those kernel messages mean.  They're not locked
(well, they are, but only for a couple seconds).

Regards,

Bob Peterson
Red Hat GFS



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]