[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] gfs2 filesystem crash with no recovery?



On 03/18/2010 10:04 AM, Steven Whitehouse wrote:
Hi,

On Thu, 2010-03-18 at 09:18 -0400, Douglas O'Neal wrote:
On 03/15/2010 09:55 AM, Douglas O'Neal wrote:
I have a problem with a gfs2 filesystem that is (was) being mounted from a single host. The system appeared to have hung over the weekend so I unmounted and remounted the disk. After a couple of minutes I received this in the kernel logs:

Mar 15 08:28:50 localhost kernel: GFS2: fsid=: Trying to join cluster "lock_nolock", "sde1"
Mar 15 08:28:50 localhost kernel: GFS2: fsid=sde1.0: Now mounting FS...
Mar 15 08:28:50 localhost kernel: GFS2: fsid=sde1.0: jid=0, already locked for use Mar 15 08:28:50 localhost kernel: GFS2: fsid=sde1.0: jid=0: Looking at journal...
Mar 15 08:28:50 localhost kernel: GFS2: fsid=sde1.0: jid=0: Done
Mar 15 08:43:37 localhost kernel: GFS2: fsid=sde1.0: fatal: invalid metadata block Mar 15 08:43:37 localhost kernel: GFS2: fsid=sde1.0: bh = 4294972166 (type: exp=3, found=2) Mar 15 08:43:37 localhost kernel: GFS2: fsid=sde1.0: function = gfs2_rgrp_bh_get, file = fs/gfs2/rgrp.c, line = 759 Mar 15 08:43:37 localhost kernel: GFS2: fsid=sde1.0: about to withdraw this file system
Mar 15 08:43:37 localhost kernel: GFS2: fsid=sde1.0: withdrawn
Mar 15 08:43:37 localhost kernel: Pid: 3687, comm: cp Not tainted 2.6.32-gentoo-r7 #2
Mar 15 08:43:37 localhost kernel: Call Trace:
Mar 15 08:43:37 localhost kernel: [<ffffffffa03b285d>] ? gfs2_lm_withdraw+0x12d/0x160 [gfs2] Mar 15 08:43:37 localhost kernel: [<ffffffff813bf22b>] ? io_schedule+0x4b/0x70 Mar 15 08:43:37 localhost kernel: [<ffffffff810cc560>] ? sync_buffer+0x0/0x50 Mar 15 08:43:37 localhost kernel: [<ffffffff813bf7a9>] ? out_of_line_wait_on_bit+0x79/0xa0 Mar 15 08:43:37 localhost kernel: [<ffffffff8104e740>] ? wake_bit_function+0x0/0x30 Mar 15 08:43:37 localhost kernel: [<ffffffff810cb162>] ? submit_bh+0x112/0x140 Mar 15 08:43:37 localhost kernel: [<ffffffffa03b2947>] ? gfs2_metatype_check_ii+0x47/0x60 [gfs2] Mar 15 08:43:37 localhost kernel: [<ffffffffa03ae40b>] ? gfs2_rgrp_bh_get+0x1db/0x300 [gfs2] Mar 15 08:43:37 localhost kernel: [<ffffffffa0397d86>] ? do_promote+0x116/0x200 [gfs2] Mar 15 08:43:37 localhost kernel: [<ffffffffa03992a5>] ? finish_xmote+0x1a5/0x3a0 [gfs2] Mar 15 08:43:37 localhost kernel: [<ffffffffa0398fcd>] ? do_xmote+0xfd/0x230 [gfs2] Mar 15 08:43:37 localhost kernel: [<ffffffffa039986d>] ? gfs2_glock_nq+0x13d/0x320 [gfs2] Mar 15 08:43:37 localhost kernel: [<ffffffffa03aea2d>] ? gfs2_inplace_reserve_i+0x1ed/0x7b0 [gfs2] Mar 15 08:43:37 localhost kernel: [<ffffffffa0399581>] ? run_queue+0xe1/0x210 [gfs2] Mar 15 08:43:37 localhost kernel: [<ffffffffa039986d>] ? gfs2_glock_nq+0x13d/0x320 [gfs2] Mar 15 08:43:37 localhost kernel: [<ffffffffa03a1f92>] ? gfs2_write_begin+0x272/0x480 [gfs2] Mar 15 08:43:37 localhost kernel: [<ffffffff8106df04>] ? generic_file_buffered_write+0x114/0x290 Mar 15 08:43:37 localhost kernel: [<ffffffff8106e4a8>] ? __generic_file_aio_write+0x278/0x450 Mar 15 08:43:37 localhost kernel: [<ffffffff8106e6d5>] ? generic_file_aio_write+0x55/0xb0 Mar 15 08:43:37 localhost kernel: [<ffffffff810a6a1b>] ? do_sync_write+0xdb/0x120 Mar 15 08:43:37 localhost kernel: [<ffffffff8104e710>] ? autoremove_wake_function+0x0/0x30 Mar 15 08:43:37 localhost kernel: [<ffffffff8108511f>] ? handle_mm_fault+0x1bf/0x850 Mar 15 08:43:37 localhost kernel: [<ffffffff8108b5cc>] ? mmap_region+0x23c/0x5d0 Mar 15 08:43:37 localhost kernel: [<ffffffff810a752b>] ? vfs_write+0xcb/0x160 Mar 15 08:43:37 localhost kernel: [<ffffffff810a76c3>] ? sys_write+0x53/0xa0 Mar 15 08:43:37 localhost kernel: [<ffffffff8100b2ab>] ? system_call_fastpath+0x16/0x1b

I again unmounted the disk but now when I try to fsck the filesystem I get:
urania# fsck.gfs2 -v /dev/sde1
Initializing fsck
Initializing lists...
Either the super block is corrupted, or this is not a GFS2 filesystem

The server is a running kernel 2.6.32, 64-bit. The array is a Jetstore 516iS with a single 28TB iSCSI volume defined. The relevant line from the fstab is
/dev/sde1        /illumina    gfs2    _netdev,rw,lockproto=lock_nolock

gfs2_tool isn't much help, nor is gfs2_edit:
urania# gfs2_tool sb /dev/sde1 all
/usr/src/cluster-3.0.7/gfs2/tool/../libgfs2/libgfs2.h: there isn't a GFS2 filesystem on /dev/sde1
urania# gfs2_edit -p sb /dev/sde1
bad seek: Invalid argument from gfs2_load_inode:416: block 3747350044811107074 (0x34014302ee029b02)

Is there an alternate superblock that I can use to mount the disk to at least get the last couple of days of data off of it?

Anybody?

What version of the userland tools are you using? There has been an
update recently to fsck designed to solve a number of problems. I've
never seen a filesystem which is so badly corrupted that the super block
is unrecognisable before. The super block is not ever altered during
normal fs usage.

Are you 100% certain that this volume was not being accessed by another
node on the network?

If you can save off the metadata then we can take a look at it. That
might not be possible with a corrupt superblock though, so an
alternative is to make it available somehow for us to look at,

Steve.


--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster
Userland tools 3.0.7. The iSCSI array is on a closed network and is protected by a CHAP login. No other system has been configured to access the array. I have the first 1MB of the disk available at http://urania.dbi.udel.edu/sde.block.bz2 if you want to see the actual data. gfs2_edit will not pull the metadata off:

urania ~ # gfs2_edit savemeta /dev/sde /tmp/metasave
Segmentation fault


Doug


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]