[Linux-cluster] Re: gfs2 mount issue

Nick Couchman Nick.Couchman at seakr.com
Wed Apr 18 14:33:40 UTC 2007


Well, first of all, I'm using this with an iSCSI device, so perhaps the relatively high latency of iSCSI (as compared to direct attached devices) is causing something? 

Also, I've turned on a bunch of debugging options (mainly having to do with spin locks and whatever else I could find about locking) and the bug messages at mount time seem to have gone away.  Instead I get the following warning when I connect to an iSCSI device, followed by a very long string of kernel debug output that talks about where the hard-safe switched to hard-unsafe, etc.: 
[ INFO: hard-safe -> hard-unsafe lock order detected ] 

I'm not going to post the output, since this isn't the open-iscsi forum.  I still have trouble with locks on the GFS2 volume, though - if a process has a file locked (for example, editing a file with vi or starting Samba with configuration information and shares on the GFS2 volume) I can't perform any other operations on the volume (like an ls or removing a directory) - the processes lock waiting on I/O and never come back (and I can't shut down the original process that took out the first lock.  I'm not really sure how to debug this one, since I don't get any other error messages and the only other symptom is that my CPU wait time goes through the roof.  I haven't tried this with anything other than iSCSI devices, so it's quite possible that some bug in the iSCSI code is causing locking problems in the GFS2 code, but I don't really know.  If you need any more info from me I'm happy to provide whatever you like. 

Thanks for the feedback!


Nick Couchman
Systems Integrator
SEAKR Engineering, Inc.
6221 South Racine Circle
Centennial, CO 80111
Main: (303) 790-8499
Fax: (303) 790-8720
Web: http://www.seakr.com



>>> On Wed, Apr 18, 2007 at  1:46 AM, Steven Whitehouse <swhiteho at redhat.com> wrote:

Hi,

I suspect its got missed since there is no bugzilla entry. It looks like
a glock has been used after its been freed. I suspect that the second
oops is just a consequence of the first.

So the question here is really, how did the glock get freed, and yet
still apparently be in the reclaim list (and it looks like also in the
hash table). I've not seen this bug locally, so I guess that it might be
related to the relative speeds of various operations on different
hardware.

Quotas on gfs2 will need some work and thats a known bug, but what you
have run into seems to be unconnected with that,

Steve.

On Tue, 2007-04-17 at 11:23 -0600, Nick Couchman wrote:
> By the way - I found this thread on the linux-kernel mailing list that references the same sort of bug:
> http://lkml.org/lkml/2007/1/25/8
>
> There was a suggestion made that this has to do with kernel preemption - I have preemption completely disabled and still get the same bug.  From my very limited kernel knowledge (that is, reading the output of the bug message) it seems to have to do with spinlocks in the kernel.  I've enabled spinlock debugging and I'll see if I can get any more information, but I'm just not a kernel developer.  There don't seem to be any patches out in the 2.6.21-rc or the -mm branches of the kernel to fix this issue.
>
> I know this has been mentioned a few times in the list, but I haven't seen anything too recent on this issue.  I'm attempting to use GFS2 and am getting some kernel bug messages when I mount the filesystems.  This seems to happen with kernels 2.6.19-2.6.21-rc6-mm1 (the one I'm currently using).  The first message is this:
> ------------[ cut here ]------------
> kernel BUG at fs/gfs2/glock.c:656!
> invalid opcode: 0000 [#1]
> last sysfs file: fs/gfs2/fstest:testfs/lock_module/block
> Modules linked in: lock_nolock lock_dlm gfs2 dlm configfs crc32c libcrc32c iscsi_tcp libiscsi scsi_transport_iscsi af_packet button battery ac loop pcnet32 mii ext3 jbd dm_snapshot edd dm_mod fan thermal processor ide_generic sg BusLogic piix sd_mod scsi_mod ide_disk ide_core
> CPU:    0
> EIP:    0060:[<d0a30e09>]    Not tainted VLI
> EFLAGS: 00010296   (2.6.21-rc6-mm1-default #1)
> EIP is at gfs2_glmutex_unlock+0x1b/0x1f [gfs2]
> eax: c223bec8   ebx: c34cc000   ecx: 00000000   edx: c23833c0
> esi: c223be84   edi: c14dbf8c   ebp: 00000000   esp: c14dbf58
> ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
> Process gfs2_glockd (pid: 3804, ti=c14da000 task=c22e8a90 task.ti=c14da000)
> Stack: c0369794 c23833c0 c34cc000 d0a30e7f c34cc000 c34cc000 c26f3c94 d0a29477
>        00000000 00000000 00000000 00000000 00000000 c26f3c98 00000001 00000282
>        23ed8d84 00001337 c14dbfc0 00000000 c22e8a90 c0125c44 c14dbfb0 c14dbfb0
> Call Trace:
>  [<d0a30e7f>] gfs2_reclaim_glock+0x72/0x80 [gfs2]
>  [<d0a29477>] gfs2_glockd+0x13/0xc0 [gfs2]
>  [<c0125c44>] autoremove_wake_function+0x0/0x35
>  [<d0a29464>] gfs2_glockd+0x0/0xc0 [gfs2]
>  [<c0125ae3>] kthread+0xa3/0xcc
>  [<c0125a40>] kthread+0x0/0xcc
>  [<c0104cd7>] kernel_thread_helper+0x7/0x10
>  =======================
> Code: 5e 5f 5d e9 0a ef ff ff 83 c4 0c 5b 5e 5f 5d c3 83 ec 0c 0f ba 70 08 01 c7 40 2c 00 00 00 00 c7 40 30 00 00 00 00 e8 50 f7 ff ff <0f> 0b eb fe 56 53 89 c3 83 ec 04 8d 80 44 03 00 00 39 83 44 03
> EIP: [<d0a30e09>] gfs2_glmutex_unlock+0x1b/0x1f [gfs2] SS:ESP 0068:c14dbf58
>
>
> followed shortly by this:
> ------------[ cut here ]------------
> kernel BUG at fs/gfs2/glock.c:656!
> invalid opcode: 0000 [#2]
> last sysfs file: fs/gfs2/fstest:testfs/lock_module/block
> Modules linked in: lock_nolock lock_dlm gfs2 dlm configfs crc32c libcrc32c iscsi_tcp libiscsi scsi_transport_iscsi af_packet button battery ac loop pcnet32 mii ext3 jbd dm_snapshot edd dm_mod fan thermal processor ide_generic sg BusLogic piix sd_mod scsi_mod ide_disk ide_core
> CPU:    0
> EIP:    0060:[<d0a30e09>]    Not tainted VLI
> EFLAGS: 00010292   (2.6.21-rc6-mm1-default #1)
> EIP is at gfs2_glmutex_unlock+0x1b/0x1f [gfs2]
> eax: c223bf64   ebx: c223bf20   ecx: 00000001   edx: c223bc14
> esi: 00000001   edi: c34cc000   ebp: d0a3125c   esp: c14d9f78
> ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
> Process gfs2_scand (pid: 3803, ti=c14d8000 task=c22ea030 task.ti=c14d8000)
> Stack: c26f3c20 c223bc14 c223bf20 d0a30068 00000003 c26f3c98 00000001 00001078
>        c34cc000 d0a29524 00000000 d0a3018d c038ad60 c34cc000 c26f3c94 d0a29533
>        c26f3c94 d0a29524 c34cc000 c0125ae3 00000000 00000000 ffffffff ffffffff
> Call Trace:
>  [<d0a30068>] examine_bucket+0x38/0x59 [gfs2]
>  [<d0a29524>] gfs2_scand+0x0/0x2d [gfs2]
>  [<d0a3018d>] gfs2_scand_internal+0x18/0x24 [gfs2]
>  [<d0a29533>] gfs2_scand+0xf/0x2d [gfs2]
>  [<d0a29524>] gfs2_scand+0x0/0x2d [gfs2]
>  [<c0125ae3>] kthread+0xa3/0xcc
>  [<c0125a40>] kthread+0x0/0xcc
>  [<c0104cd7>] kernel_thread_helper+0x7/0x10
>  =======================
> Code: 5e 5f 5d e9 0a ef ff ff 83 c4 0c 5b 5e 5f 5d c3 83 ec 0c 0f ba 70 08 01 c7 40 2c 00 00 00 00 c7 40 30 00 00 00 00 e8 50 f7 ff ff <0f> 0b eb fe 56 53 89 c3 83 ec 04 8d 80 44 03 00 00 39 83 44 03
> EIP: [<d0a30e09>] gfs2_glmutex_unlock+0x1b/0x1f [gfs2] SS:ESP 0068:c14d9f78
>
>
> After I get those messages, I can list files, create files, and delete files.  I run into problems if I try to use quotas or ACLs on the filesystem, and I can't unmount the filesystem - I have to hard reset the machine.  Also, it doesn't seem to matter whether I use the lock_dlm or lock_nolock protocols - both seem to generate these messages.
>
> Nick Couchman
> Systems Integrator
> SEAKR Engineering, Inc.
> 6221 South Racine Circle
> Centennial, CO 80111
> Main: (303) 790-8499
> Fax: (303) 790-8720
> Web: http://www.seakr.com
>
>
>
>
>
>
> Nick Couchman
> Systems Integrator
> SEAKR Engineering, Inc.
> 6221 South Racine Circle
> Centennial, CO 80111
> Main: (303) 790-8499
> Fax: (303) 790-8720
> Web: http://www.seakr.com
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070418/c15a16b1/attachment.htm>


More information about the Linux-cluster mailing list