[Cluster-devel] Problems mounting GFS2 devices
Fabio M. Di Nitto
fabbione at ubuntu.com
Thu Jul 20 06:34:21 UTC 2006
Hi guys,
this is using the latest gfs2 code from git and the latest cvs head userland.
# gfs2_mkfs -t edgy:mygfs2 -p lock_dlm -j 4 /dev/mapper/mofo
This will destroy any data on /dev/mapper/mofo.
Are you sure you want to proceed? [y/n] y
Device: /dev/mapper/mofo
Blocksize: 4096
Device Size 237.36 GB (62223680 blocks)
Filesystem Size: 237.36 GB (62223679 blocks)
Journals: 4
Resource Groups: 950
Locking Protocol: "lock_dlm"
Lock Table: "edgy:mygfs2"
mapper/mofo is a SAN exported device as seen by multipath,
but accessing the device directly makes no difference.
# mount /dev/mapper/mofo /mnt
Segmentation fault
# dmesg
[42950437.160000] GFS2: fsid=: Trying to join cluster "lock_dlm", "edgy:mygfs2"
[42950437.170000] dlm: mygfs2: recover 1
[42950437.170000] dlm: mygfs2: add member 1
[42950437.170000] dlm: mygfs2: total members 1
[42950437.170000] dlm: mygfs2: dlm_recover_directory
[42950437.170000] dlm: mygfs2: dlm_recover_directory 0 entries
[42950437.170000] dlm: mygfs2: recover 1 done: 0 ms
[42950437.170000] GFS2: fsid=edgy:mygfs2.4294967295: Joined cluster. Now mounting FS...
[42950437.180000] GFS2: fsid=edgy:mygfs2.4294967295: can't mount journal #4294967295
[42950437.180000] GFS2: fsid=edgy:mygfs2.4294967295: there are only 4 journals (0 - 3)
[42950437.180000] GFS2: fsid=edgy:mygfs2.4294967295: fatal assertion failed
^^^ note i get the same kind of error no matter how many journals i create.
[42950437.180000] ------------[ cut here ]------------
[42950437.180000] kernel BUG at fs/gfs2/ops_super.c:290!
[42950437.180000] invalid opcode: 0000 [#1]
[42950437.180000] SMP
[42950437.180000] Modules linked in: video tc1100_wmi sony_acpi pcc_acpi hotkey dev_acpi container button acpi_sbs battery ac i2c_acpi_ec i2c_core sctp lock_dlm gfs2 dlm configfs ipv6 af_packet md_mod lp sg snd_intel8x0 snd_ac97_codec snd_ac97_bus hw_random snd_pcm_oss snd_mixer_oss tsdev shpchp snd_pcm snd_timer evdev intel_agp agpgart snd soundcore snd_page_alloc pci_hotplug e100 mii parport_pc psmouse pcspkr floppy serio_raw parport dm_round_robin dm_multipath dm_mod ext3 jbd sd_mod uhci_hcd usbcore lpfc scsi_transport_fc scsi_mod ide_generic ide_cd cdrom ide_disk piix generic thermal processor fan vesafb capability commoncap vga16fb vgastate fbcon tileblit font bitblit softcursor
[42950437.180000] CPU: 0
[42950437.180000] EIP: 0060:[<e0c20493>] Not tainted VLI
[42950437.180000] EFLAGS: 00010296 (2.6.17-5-server #2)
[42950437.180000] EIP is at gfs2_clear_inode+0x73/0x90 [gfs2]
[42950437.180000] eax: 0000004f ebx: d0118048 ecx: 00000000 edx: 00000292
[42950437.180000] esi: 00000000 edi: e0bd3000 ebp: e0beb4ac esp: d3c4fcb8
[42950437.180000] ds: 007b es: 007b ss: 0068
[42950437.180000] Process mount (pid: 4736, threadinfo=d3c4e000 task=dafb2580)
[42950437.180000] Stack: d0118048 c01850bd df20c400 d0118048 df20c400 c01852ce d0118048 e0beb788
[42950437.180000] c0184bac ffffffea e0c1cd2b e0c2cecc e0beb788 00000004 00000003 d3c4fcf4
[42950437.180000] d3c4fcf4 00000000 dafb2580 00000003 00000020 00000000 000000c2 00000000
[42950437.180000] Call Trace:
[42950437.180000] <c01850bd> clear_inode+0x9d/0x120 <c01852ce> generic_drop_inode+0x6e/0x150
[42950437.180000] <c0184bac> iput+0x5c/0x70 <e0c1cd2b> init_journal+0x8b/0x4a0 [gfs2]
[42950437.180000] <e0c1d17f> init_inodes+0x3f/0x200 [gfs2] <e0c1dd8f> fill_super+0x58f/0x6e0 [gfs2]
[42950437.180000] <e0c107e8> gfs2_glock_nq_num+0x48/0x80 [gfs2] <c017278c> get_sb_bdev+0xec/0x130
[42950437.180000] <c0187598> alloc_vfsmnt+0xa8/0xe0 <e0c1c859> gfs2_get_sb+0x19/0x20 [gfs2]
[42950437.180000] <e0c1d800> fill_super+0x0/0x6e0 [gfs2] <c017210c> do_kern_mount+0xcc/0x170
[42950437.180000] <c01889a5> do_mount+0x435/0x730 <c014e339> filemap_nopage+0x2e9/0x390
[42950437.180000] <c0158b88> __handle_mm_fault+0x368/0xc10 <c01190a6> do_page_fault+0x3b6/0x744
[42950437.180000] <c0103be7> error_code+0x4f/0x54 <c0150c32> __alloc_pages+0x52/0x310
[42950437.180000] <c0187873> copy_mount_options+0x43/0x150 <c0188d17> sys_mount+0x77/0xc0
[42950437.180000] <c0103007> sysenter_past_esp+0x54/0x75
[42950437.180000] Code: 60 02 00 00 85 c0 74 10 8d 83 64 02 00 00 5b e9 a4 f4 fe ff 8d 74 26 00 5b c3 8b 83 9c 00 00 00 8b 80 60 01 00 00 e8 9d 98 00 00 <0f> 0b 22 01 dc ba c2 e0 8b 83 60 02 00 00 eb 9e 8d b6 00 00 00
[42950437.180000] EIP: [<e0c20493>] gfs2_clear_inode+0x73/0x90 [gfs2] SS:ESP 0068:d3c4fcb8
[42950437.180000] <1>BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
[42950437.520000] printing eip:
[42950437.530000] e0c1005e
[42950437.530000] *pde = 0170d001
[42950437.540000] Oops: 0002 [#2]
[42950437.540000] SMP
[42950437.540000] Modules linked in: video tc1100_wmi sony_acpi pcc_acpi hotkey dev_acpi container button acpi_sbs battery ac i2c_acpi_ec i2c_core sctp lock_dlm gfs2 dlm configfs ipv6 af_packet md_mod lp sg snd_intel8x0 snd_ac97_codec snd_ac97_bus hw_random snd_pcm_oss snd_mixer_oss tsdev shpchp snd_pcm snd_timer evdev intel_agp agpgart snd soundcore snd_page_alloc pci_hotplug e100 mii parport_pc psmouse pcspkr floppy serio_raw parport dm_round_robin dm_multipath dm_mod ext3 jbd sd_mod uhci_hcd usbcore lpfc scsi_transport_fc scsi_mod ide_generic ide_cd cdrom ide_disk piix generic thermal processor fan vesafb capability commoncap vga16fb vgastate fbcon tileblit font bitblit softcursor
[42950437.540000] CPU: 0
[42950437.540000] EIP: 0060:[<e0c1005e>] Not tainted VLI
[42950437.540000] EFLAGS: 00010246 (2.6.17-5-server #2)
[42950437.540000] EIP is at drop_bh+0x8e/0x1b0 [gfs2]
[42950437.540000] eax: 00000004 ebx: d484f43c ecx: 00000000 edx: d0118048
[42950437.540000] esi: d3c4fc74 edi: d484f458 ebp: 00000000 esp: c8505f2c
[42950437.540000] ds: 007b es: 007b ss: 0068
[42950437.540000] Process lock_dlm2 (pid: 4739, threadinfo=c8504000 task=dfc81a90)
[42950437.540000] Stack: e0be4358 c8505fac e0bd3000 e0c3e220 e0bd3000 c8505fac d484f43c df20ce00
[42950437.540000] e0c0f746 00000292 c0135c7a df20ce00 df348f40 fffefffe e0b32b9d 00000000
[42950437.540000] 00000009 dfc81b98 dfc81a90 dffa7a90 c1404d20 c8505fac df20cf74 00010000
[42950437.540000] Call Trace:
[42950437.540000] <e0c0f746> gfs2_glock_cb+0x96/0x170 [gfs2] <c0135c7a> remove_wait_queue+0x1a/0x50
[42950437.540000] <e0b32b9d> gdlm_thread+0x4fd/0x740 [lock_dlm] <c011b9f0> default_wake_function+0x0/0x10
[42950437.540000] <e0b326a0> gdlm_thread+0x0/0x740 [lock_dlm] <c013586c> kthread+0xac/0xe0
[42950437.540000] <c01357c0> kthread+0x0/0xe0 <c0101005> kernel_thread_helper+0x5/0x10
[42950437.540000] Code: 89 d8 e8 d6 f0 ff ff 8b 44 24 0c 8b 48 14 85 c9 74 09 ba 60 00 00 00 89 d8 ff d1 85 f6 74 22 89 f8 e8 e7 6a 6c df 8b 06 8b 56 04 <89> 50 04 89 02 b0 01 89 36 89 76 04 c7 46 18 00 00 00 00 86 43
[42950437.540000] EIP: [<e0c1005e>] drop_bh+0x8e/0x1b0 [gfs2] SS:ESP 0068:c8505f2c
[42950437.540000] <3>BUG: soft lockup detected on CPU#0!
system is still usable for a few seconds. then another OOPS appears on the terminal and
the machine dies hard.
(hand copied)
[42950461.990000] <c014899x> softlockup_tick+0x9c/0xf0 <c012b9c1> update_process_times+0x21/0x80
[42950461.990000] <c0113cb1> smp_apic_timer_interrupt+0x51/0x60 <c0103b40> apic_timer_interrupt+0x1c/0x24
[42950461.990000] <c02d6b45> _spin_lock+0x5/0x10 <e0c0e85b> gfs2_glmutex_trylock+0xb/0x40 [gfs2]
[42950461.990000] <e0c10f88> scan_glock+0x8/0x70 [gfs2] <e0c0e9fb> examine_bucket+0x8b/0xd0 [gfs2]
[42950461.990000] <e0c10f80> scan_glock+0x0/0x70 [gfs2] <e0c07790> gfs2_scand+0x0/0x50 [gfs2]
[42950461.990000] <e0c0ebaf> gfs2_scand_internal+0x1f/0x40 [gfs2] <e0c0779c> gfs2_scand+0xc/0x50 [gfs2]
[42950461.990000] <c013586c> kthread+0xac/0xe0 <c01357c0> kthread+0x0/0xe0
[42950461.990000] <c0101005> kernel_thread_herlper+0x5/0x10
Here a test with lock_nolock:
# gfs2_mkfs -t edgy:mygfs2 -p lock_nolock -j 4 /dev/mapper/mofo
This will destroy any data on /dev/mapper/mofo.
Are you sure you want to proceed? [y/n] y
Device: /dev/mapper/mofo
Blocksize: 4096
Device Size 237.36 GB (62223680 blocks)
Filesystem Size: 237.36 GB (62223679 blocks)
Journals: 4
Resource Groups: 950
Locking Protocol: "lock_nolock"
Lock Table: "edgy:mygfs2"
[42949467.940000] Lock_Nolock (built Jul 18 2006 14:27:44) installed
[42949521.080000] GFS2: fsid=: Trying to join cluster "lock_nolock", "edgy:mygfs2"
[42949521.080000] GFS2: fsid=edgy:mygfs2.0: Joined cluster. Now mounting FS...
[42949521.220000] GFS2: fsid=edgy:mygfs2.0: jid=0, already locked for use
[42949521.220000] GFS2: fsid=edgy:mygfs2.0: jid=0: Looking at journal...
[42949521.330000] GFS2: fsid=edgy:mygfs2.0: jid=0: Done
[42949521.330000] GFS2: fsid=edgy:mygfs2.0: jid=1: Trying to acquire journal lock...
[42949521.330000] GFS2: fsid=edgy:mygfs2.0: jid=1: Looking at journal...
[42949521.470000] GFS2: fsid=edgy:mygfs2.0: jid=1: Done
[42949521.470000] GFS2: fsid=edgy:mygfs2.0: jid=2: Trying to acquire journal lock...
[42949521.470000] GFS2: fsid=edgy:mygfs2.0: jid=2: Looking at journal...
[42949521.620000] GFS2: fsid=edgy:mygfs2.0: jid=2: Done
[42949521.620000] GFS2: fsid=edgy:mygfs2.0: jid=3: Trying to acquire journal lock...
[42949521.620000] GFS2: fsid=edgy:mygfs2.0: jid=3: Looking at journal...
[42949521.770000] GFS2: fsid=edgy:mygfs2.0: jid=3: Done
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda1 19G 789M 17G 5% /
varrun 252M 80K 252M 1% /var/run
varlock 252M 4,0K 252M 1% /var/lock
udev 10M 112K 9,9M 2% /dev
devshm 252M 0 252M 0% /dev/shm
Segmentation fault
# dmesg
[42949571.960000] BUG: unable to handle kernel paging request at virtual address 0000109c
[42949571.960000] printing eip:
[42949571.960000] e0c374c8
[42949571.960000] *pde = 1bbf2001
[42949571.960000] Oops: 0000 [#1]
[42949571.960000] SMP
[42949571.960000] Modules linked in: lock_nolock video tc1100_wmi sony_acpi pcc_acpi hotkey dev_acpi container button acpi_sbs battery ac i2c_acpi_ec i2c_core sctp lock_dlm gfs2 dlm configfs ipv6 af_packet md_mod lp hw_random snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_oss sg snd_pcm snd_timer snd soundcore e100 tsdev evdev mii shpchp intel_agp agpgart pci_hotplug snd_page_alloc parport_pc psmouse serio_raw pcspkr parport floppy dm_round_robin dm_multipath dm_mod ext3 jbd sd_mod lpfc scsi_transport_fc uhci_hcd usbcore scsi_mod ide_generic ide_cd cdrom ide_disk piix generic thermal processor fan vesafb capability commoncap vga16fb vgastate fbcon tileblit font bitblit softcursor
[42949571.960000] CPU: 0
[42949571.960000] EIP: 0060:[<e0c374c8>] Not tainted VLI
[42949571.960000] EFLAGS: 00010286 (2.6.17-5-server #2)
[42949571.960000] EIP is at gfs2_statfs+0x18/0xd0 [gfs2]
[42949571.960000] eax: 00001000 ebx: def2d800 ecx: e0c556c0 edx: cc1abeb0
[42949571.960000] esi: cc1abeb0 edi: cc1abf04 ebp: cc1abeb0 esp: cc1abe74
[42949571.960000] ds: 007b es: 007b ss: 0068
[42949571.960000] Process df (pid: 4689, threadinfo=cc1aa000 task=dfc7da90)
[42949571.960000] Stack: dffc5ea0 dfbfe5f8 c017b8c1 dc7ff000 dfbfe5f8 dffc5ea0 def2d800 cc1abeb0
[42949571.960000] cc1abf04 cc1aa000 c0168fe5 00000000 cc1abeb0 cc1abf14 c0169116 00000000
[42949571.960000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[42949571.960000] Call Trace:
[42949571.960000] <c017b8c1> link_path_walk+0x71/0xf0 <c0168fe5> vfs_statfs+0x65/0x80
[42949571.960000] <c0169116> vfs_statfs64+0x16/0x30 <c016a5c3> sys_statfs64+0x83/0xc0
[42949571.960000] <c0226220> tty_write+0x0/0x1f0 <c016be11> sys_write+0x41/0x70
[42949571.960000] <c0103007> sysenter_past_esp+0x54/0x75
[42949571.960000] Code: 60 02 00 00 eb 9e 8d b6 00 00 00 00 8d bc 27 00 00 00 00 83 ec 28 89 74 24 1c 89 7c 24 20 89 6c 24 24 89 d5 89 5c 24 18 8b 40 0c <8b> 80 9c 00 00 00 8b 98 60 01 00 00 8d 83 e4 02 00 00 e8 61 f6
[42949571.960000] EIP: [<e0c374c8>] gfs2_statfs+0x18/0xd0 [gfs2] SS:ESP 0068:cc1abe74
[42949571.960000]
Thanks for your time
Fabio
PS of course i am ready to test possible patches or provide any extra info
required. The SAN is not in production so we can play as much as we want.
More information about the Cluster-devel
mailing list