[Cluster-devel] Problems mounting GFS2 devices

Fabio M. Di Nitto fabbione at ubuntu.com
Thu Jul 20 06:34:21 UTC 2006


Hi guys,

this is using the latest gfs2 code from git and the latest cvs head userland.

# gfs2_mkfs -t edgy:mygfs2 -p lock_dlm -j 4 /dev/mapper/mofo 
This will destroy any data on /dev/mapper/mofo.

Are you sure you want to proceed? [y/n] y

Device:                    /dev/mapper/mofo
Blocksize:                 4096
Device Size                237.36 GB (62223680 blocks)
Filesystem Size:           237.36 GB (62223679 blocks)
Journals:                  4
Resource Groups:           950
Locking Protocol:          "lock_dlm"
Lock Table:                "edgy:mygfs2"

mapper/mofo is a SAN exported device as seen by multipath,
but accessing the device directly makes no difference.

# mount /dev/mapper/mofo /mnt
Segmentation fault

# dmesg
[42950437.160000] GFS2: fsid=: Trying to join cluster "lock_dlm", "edgy:mygfs2"
[42950437.170000] dlm: mygfs2: recover 1
[42950437.170000] dlm: mygfs2: add member 1
[42950437.170000] dlm: mygfs2: total members 1
[42950437.170000] dlm: mygfs2: dlm_recover_directory
[42950437.170000] dlm: mygfs2: dlm_recover_directory 0 entries
[42950437.170000] dlm: mygfs2: recover 1 done: 0 ms
[42950437.170000] GFS2: fsid=edgy:mygfs2.4294967295: Joined cluster. Now mounting FS...
[42950437.180000] GFS2: fsid=edgy:mygfs2.4294967295: can't mount journal #4294967295
[42950437.180000] GFS2: fsid=edgy:mygfs2.4294967295: there are only 4 journals (0 - 3)
[42950437.180000] GFS2: fsid=edgy:mygfs2.4294967295: fatal assertion failed

^^^ note i get the same kind of error no matter how many journals i create.

[42950437.180000] ------------[ cut here ]------------
[42950437.180000] kernel BUG at fs/gfs2/ops_super.c:290!
[42950437.180000] invalid opcode: 0000 [#1]
[42950437.180000] SMP 
[42950437.180000] Modules linked in: video tc1100_wmi sony_acpi pcc_acpi hotkey dev_acpi container button acpi_sbs battery ac i2c_acpi_ec i2c_core sctp lock_dlm gfs2 dlm configfs ipv6 af_packet md_mod lp sg snd_intel8x0 snd_ac97_codec snd_ac97_bus hw_random snd_pcm_oss snd_mixer_oss tsdev shpchp snd_pcm snd_timer evdev intel_agp agpgart snd soundcore snd_page_alloc pci_hotplug e100 mii parport_pc psmouse pcspkr floppy serio_raw parport dm_round_robin dm_multipath dm_mod ext3 jbd sd_mod uhci_hcd usbcore lpfc scsi_transport_fc scsi_mod ide_generic ide_cd cdrom ide_disk piix generic thermal processor fan vesafb capability commoncap vga16fb vgastate fbcon tileblit font bitblit softcursor
[42950437.180000] CPU:    0
[42950437.180000] EIP:    0060:[<e0c20493>]    Not tainted VLI
[42950437.180000] EFLAGS: 00010296   (2.6.17-5-server #2) 
[42950437.180000] EIP is at gfs2_clear_inode+0x73/0x90 [gfs2]
[42950437.180000] eax: 0000004f   ebx: d0118048   ecx: 00000000   edx: 00000292
[42950437.180000] esi: 00000000   edi: e0bd3000   ebp: e0beb4ac   esp: d3c4fcb8
[42950437.180000] ds: 007b   es: 007b   ss: 0068
[42950437.180000] Process mount (pid: 4736, threadinfo=d3c4e000 task=dafb2580)
[42950437.180000] Stack: d0118048 c01850bd df20c400 d0118048 df20c400 c01852ce d0118048 e0beb788 
[42950437.180000]        c0184bac ffffffea e0c1cd2b e0c2cecc e0beb788 00000004 00000003 d3c4fcf4 
[42950437.180000]        d3c4fcf4 00000000 dafb2580 00000003 00000020 00000000 000000c2 00000000 
[42950437.180000] Call Trace:
[42950437.180000]  <c01850bd> clear_inode+0x9d/0x120  <c01852ce> generic_drop_inode+0x6e/0x150
[42950437.180000]  <c0184bac> iput+0x5c/0x70  <e0c1cd2b> init_journal+0x8b/0x4a0 [gfs2]
[42950437.180000]  <e0c1d17f> init_inodes+0x3f/0x200 [gfs2]  <e0c1dd8f> fill_super+0x58f/0x6e0 [gfs2]
[42950437.180000]  <e0c107e8> gfs2_glock_nq_num+0x48/0x80 [gfs2]  <c017278c> get_sb_bdev+0xec/0x130
[42950437.180000]  <c0187598> alloc_vfsmnt+0xa8/0xe0  <e0c1c859> gfs2_get_sb+0x19/0x20 [gfs2]
[42950437.180000]  <e0c1d800> fill_super+0x0/0x6e0 [gfs2]  <c017210c> do_kern_mount+0xcc/0x170
[42950437.180000]  <c01889a5> do_mount+0x435/0x730  <c014e339> filemap_nopage+0x2e9/0x390
[42950437.180000]  <c0158b88> __handle_mm_fault+0x368/0xc10  <c01190a6> do_page_fault+0x3b6/0x744
[42950437.180000]  <c0103be7> error_code+0x4f/0x54  <c0150c32> __alloc_pages+0x52/0x310
[42950437.180000]  <c0187873> copy_mount_options+0x43/0x150  <c0188d17> sys_mount+0x77/0xc0
[42950437.180000]  <c0103007> sysenter_past_esp+0x54/0x75 
[42950437.180000] Code: 60 02 00 00 85 c0 74 10 8d 83 64 02 00 00 5b e9 a4 f4 fe ff 8d 74 26 00 5b c3 8b 83 9c 00 00 00 8b 80 60 01 00 00 e8 9d 98 00 00 <0f> 0b 22 01 dc ba c2 e0 8b 83 60 02 00 00 eb 9e 8d b6 00 00 00 
[42950437.180000] EIP: [<e0c20493>] gfs2_clear_inode+0x73/0x90 [gfs2] SS:ESP 0068:d3c4fcb8
[42950437.180000]  <1>BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
[42950437.520000]  printing eip:
[42950437.530000] e0c1005e
[42950437.530000] *pde = 0170d001
[42950437.540000] Oops: 0002 [#2]
[42950437.540000] SMP 
[42950437.540000] Modules linked in: video tc1100_wmi sony_acpi pcc_acpi hotkey dev_acpi container button acpi_sbs battery ac i2c_acpi_ec i2c_core sctp lock_dlm gfs2 dlm configfs ipv6 af_packet md_mod lp sg snd_intel8x0 snd_ac97_codec snd_ac97_bus hw_random snd_pcm_oss snd_mixer_oss tsdev shpchp snd_pcm snd_timer evdev intel_agp agpgart snd soundcore snd_page_alloc pci_hotplug e100 mii parport_pc psmouse pcspkr floppy serio_raw parport dm_round_robin dm_multipath dm_mod ext3 jbd sd_mod uhci_hcd usbcore lpfc scsi_transport_fc scsi_mod ide_generic ide_cd cdrom ide_disk piix generic thermal processor fan vesafb capability commoncap vga16fb vgastate fbcon tileblit font bitblit softcursor
[42950437.540000] CPU:    0
[42950437.540000] EIP:    0060:[<e0c1005e>]    Not tainted VLI
[42950437.540000] EFLAGS: 00010246   (2.6.17-5-server #2) 
[42950437.540000] EIP is at drop_bh+0x8e/0x1b0 [gfs2]
[42950437.540000] eax: 00000004   ebx: d484f43c   ecx: 00000000   edx: d0118048
[42950437.540000] esi: d3c4fc74   edi: d484f458   ebp: 00000000   esp: c8505f2c
[42950437.540000] ds: 007b   es: 007b   ss: 0068
[42950437.540000] Process lock_dlm2 (pid: 4739, threadinfo=c8504000 task=dfc81a90)
[42950437.540000] Stack: e0be4358 c8505fac e0bd3000 e0c3e220 e0bd3000 c8505fac d484f43c df20ce00 
[42950437.540000]        e0c0f746 00000292 c0135c7a df20ce00 df348f40 fffefffe e0b32b9d 00000000 
[42950437.540000]        00000009 dfc81b98 dfc81a90 dffa7a90 c1404d20 c8505fac df20cf74 00010000 
[42950437.540000] Call Trace:
[42950437.540000]  <e0c0f746> gfs2_glock_cb+0x96/0x170 [gfs2]  <c0135c7a> remove_wait_queue+0x1a/0x50
[42950437.540000]  <e0b32b9d> gdlm_thread+0x4fd/0x740 [lock_dlm]  <c011b9f0> default_wake_function+0x0/0x10
[42950437.540000]  <e0b326a0> gdlm_thread+0x0/0x740 [lock_dlm]  <c013586c> kthread+0xac/0xe0
[42950437.540000]  <c01357c0> kthread+0x0/0xe0  <c0101005> kernel_thread_helper+0x5/0x10
[42950437.540000] Code: 89 d8 e8 d6 f0 ff ff 8b 44 24 0c 8b 48 14 85 c9 74 09 ba 60 00 00 00 89 d8 ff d1 85 f6 74 22 89 f8 e8 e7 6a 6c df 8b 06 8b 56 04 <89> 50 04 89 02 b0 01 89 36 89 76 04 c7 46 18 00 00 00 00 86 43 
[42950437.540000] EIP: [<e0c1005e>] drop_bh+0x8e/0x1b0 [gfs2] SS:ESP 0068:c8505f2c
[42950437.540000]  <3>BUG: soft lockup detected on CPU#0!

system is still usable for a few seconds. then another OOPS appears on the terminal and
the machine dies hard.

(hand copied)

[42950461.990000] <c014899x> softlockup_tick+0x9c/0xf0		<c012b9c1> update_process_times+0x21/0x80
[42950461.990000] <c0113cb1> smp_apic_timer_interrupt+0x51/0x60 <c0103b40> apic_timer_interrupt+0x1c/0x24
[42950461.990000] <c02d6b45> _spin_lock+0x5/0x10		<e0c0e85b> gfs2_glmutex_trylock+0xb/0x40 [gfs2]
[42950461.990000] <e0c10f88> scan_glock+0x8/0x70 [gfs2]		<e0c0e9fb> examine_bucket+0x8b/0xd0 [gfs2]
[42950461.990000] <e0c10f80> scan_glock+0x0/0x70 [gfs2]		<e0c07790> gfs2_scand+0x0/0x50 [gfs2]
[42950461.990000] <e0c0ebaf> gfs2_scand_internal+0x1f/0x40 [gfs2] <e0c0779c> gfs2_scand+0xc/0x50 [gfs2]
[42950461.990000] <c013586c> kthread+0xac/0xe0			<c01357c0> kthread+0x0/0xe0
[42950461.990000] <c0101005> kernel_thread_herlper+0x5/0x10

Here a test with lock_nolock:

# gfs2_mkfs -t edgy:mygfs2 -p lock_nolock -j 4 /dev/mapper/mofo 
This will destroy any data on /dev/mapper/mofo.

Are you sure you want to proceed? [y/n] y

Device:                    /dev/mapper/mofo
Blocksize:                 4096
Device Size                237.36 GB (62223680 blocks)
Filesystem Size:           237.36 GB (62223679 blocks)
Journals:                  4
Resource Groups:           950
Locking Protocol:          "lock_nolock"
Lock Table:                "edgy:mygfs2"

[42949467.940000] Lock_Nolock (built Jul 18 2006 14:27:44) installed
[42949521.080000] GFS2: fsid=: Trying to join cluster "lock_nolock", "edgy:mygfs2"
[42949521.080000] GFS2: fsid=edgy:mygfs2.0: Joined cluster. Now mounting FS...
[42949521.220000] GFS2: fsid=edgy:mygfs2.0: jid=0, already locked for use
[42949521.220000] GFS2: fsid=edgy:mygfs2.0: jid=0: Looking at journal...
[42949521.330000] GFS2: fsid=edgy:mygfs2.0: jid=0: Done
[42949521.330000] GFS2: fsid=edgy:mygfs2.0: jid=1: Trying to acquire journal lock...
[42949521.330000] GFS2: fsid=edgy:mygfs2.0: jid=1: Looking at journal...
[42949521.470000] GFS2: fsid=edgy:mygfs2.0: jid=1: Done
[42949521.470000] GFS2: fsid=edgy:mygfs2.0: jid=2: Trying to acquire journal lock...
[42949521.470000] GFS2: fsid=edgy:mygfs2.0: jid=2: Looking at journal...
[42949521.620000] GFS2: fsid=edgy:mygfs2.0: jid=2: Done
[42949521.620000] GFS2: fsid=edgy:mygfs2.0: jid=3: Trying to acquire journal lock...
[42949521.620000] GFS2: fsid=edgy:mygfs2.0: jid=3: Looking at journal...
[42949521.770000] GFS2: fsid=edgy:mygfs2.0: jid=3: Done

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda1              19G  789M   17G   5% /
varrun                252M   80K  252M   1% /var/run
varlock               252M  4,0K  252M   1% /var/lock
udev                   10M  112K  9,9M   2% /dev
devshm                252M     0  252M   0% /dev/shm
Segmentation fault

# dmesg
[42949571.960000] BUG: unable to handle kernel paging request at virtual address 0000109c
[42949571.960000]  printing eip:
[42949571.960000] e0c374c8
[42949571.960000] *pde = 1bbf2001
[42949571.960000] Oops: 0000 [#1]
[42949571.960000] SMP 
[42949571.960000] Modules linked in: lock_nolock video tc1100_wmi sony_acpi pcc_acpi hotkey dev_acpi container button acpi_sbs battery ac i2c_acpi_ec i2c_core sctp lock_dlm gfs2 dlm configfs ipv6 af_packet md_mod lp hw_random snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_oss sg snd_pcm snd_timer snd soundcore e100 tsdev evdev mii shpchp intel_agp agpgart pci_hotplug snd_page_alloc parport_pc psmouse serio_raw pcspkr parport floppy dm_round_robin dm_multipath dm_mod ext3 jbd sd_mod lpfc scsi_transport_fc uhci_hcd usbcore scsi_mod ide_generic ide_cd cdrom ide_disk piix generic thermal processor fan vesafb capability commoncap vga16fb vgastate fbcon tileblit font bitblit softcursor
[42949571.960000] CPU:    0
[42949571.960000] EIP:    0060:[<e0c374c8>]    Not tainted VLI
[42949571.960000] EFLAGS: 00010286   (2.6.17-5-server #2) 
[42949571.960000] EIP is at gfs2_statfs+0x18/0xd0 [gfs2]
[42949571.960000] eax: 00001000   ebx: def2d800   ecx: e0c556c0   edx: cc1abeb0
[42949571.960000] esi: cc1abeb0   edi: cc1abf04   ebp: cc1abeb0   esp: cc1abe74
[42949571.960000] ds: 007b   es: 007b   ss: 0068
[42949571.960000] Process df (pid: 4689, threadinfo=cc1aa000 task=dfc7da90)
[42949571.960000] Stack: dffc5ea0 dfbfe5f8 c017b8c1 dc7ff000 dfbfe5f8 dffc5ea0 def2d800 cc1abeb0 
[42949571.960000]        cc1abf04 cc1aa000 c0168fe5 00000000 cc1abeb0 cc1abf14 c0169116 00000000 
[42949571.960000]        00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
[42949571.960000] Call Trace:
[42949571.960000]  <c017b8c1> link_path_walk+0x71/0xf0  <c0168fe5> vfs_statfs+0x65/0x80
[42949571.960000]  <c0169116> vfs_statfs64+0x16/0x30  <c016a5c3> sys_statfs64+0x83/0xc0
[42949571.960000]  <c0226220> tty_write+0x0/0x1f0  <c016be11> sys_write+0x41/0x70
[42949571.960000]  <c0103007> sysenter_past_esp+0x54/0x75 
[42949571.960000] Code: 60 02 00 00 eb 9e 8d b6 00 00 00 00 8d bc 27 00 00 00 00 83 ec 28 89 74 24 1c 89 7c 24 20 89 6c 24 24 89 d5 89 5c 24 18 8b 40 0c <8b> 80 9c 00 00 00 8b 98 60 01 00 00 8d 83 e4 02 00 00 e8 61 f6 
[42949571.960000] EIP: [<e0c374c8>] gfs2_statfs+0x18/0xd0 [gfs2] SS:ESP 0068:cc1abe74
[42949571.960000]  


Thanks for your time
Fabio

PS of course i am ready to test possible patches or provide any extra info
required. The SAN is not in production so we can play as much as we want.




More information about the Cluster-devel mailing list