[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] GFS1 filesystem consistency error



Hi all,

While I'm waiting for a gfs_fsck to complete, I thought I'd send this in the list's direction and ask if anyone else had any thoughts about it.

Aug 6 04:51:13 s12n01 kernel: 493 [RAIDarray.mpp]FastT1:1:0:3 Cmnd failed-retry the same path. vcmnd SN 18724261484 pdev H3:C0:T1:L3 0x00/0x00/0x00 0x00020000 mpp_status:2 Aug 6 04:51:13 s12n01 kernel: 493 [RAIDarray.mpp]FastT1:1:0:3 Cmnd failed-retry the same path. vcmnd SN 18724261488 pdev H3:C0:T1:L3 0x00/0x00/0x00 0x00020000 mpp_status:2 Aug 6 04:51:13 s12n01 kernel: 493 [RAIDarray.mpp]FastT1:1:0:3 Cmnd failed-retry the same path. vcmnd SN 18724261490 pdev H3:C0:T1:L3 0x00/0x00/0x00 0x00020000 mpp_status:2
[...]
Aug 6 05:01:14 s12n01 kernel: GFS: fsid=s12:scratch13.2: fatal: filesystem consistency error Aug 6 05:01:14 s12n01 kernel: GFS: fsid=s12:scratch13.2: inode = 4918461516/4918461516 Aug 6 05:01:14 s12n01 kernel: GFS: fsid=s12:scratch13.2: function = dinode_dealloc Aug 6 05:01:14 s12n01 kernel: GFS: fsid=s12:scratch13.2: file = / builddir/build/BUILD/gfs-kernel-2.6.9-75/smp/src/gfs/inode.c, line = 529 Aug 6 05:01:14 s12n01 kernel: GFS: fsid=s12:scratch13.2: time = 1249549274 Aug 6 05:01:14 s12n01 kernel: GFS: fsid=s12:scratch13.2: about to withdraw from the cluster Aug 6 05:01:14 s12n01 kernel: GFS: fsid=s12:scratch13.2: waiting for outstanding I/O Aug 6 05:01:14 s12n01 kernel: GFS: fsid=s12:scratch13.2: telling LM to withdraw Aug 6 05:01:15 s12n03 kernel: GFS: fsid=s12:scratch13.0: jid=2: Trying to acquire journal lock... Aug 6 05:01:15 s12n03 kernel: GFS: fsid=s12:scratch13.0: jid=2: Looking at journal... Aug 6 05:01:15 s12n02 kernel: GFS: fsid=s12:scratch13.1: jid=2: Trying to acquire journal lock...
Aug  6 05:01:15 s12n02 kernel: GFS: fsid=s12:scratch13.1: jid=2: Busy
Aug 6 05:01:15 s12n03 kernel: GFS: fsid=s12:scratch13.0: jid=2: Acquiring the transaction lock... Aug 6 05:01:15 s12n03 kernel: GFS: fsid=s12:scratch13.0: jid=2: Replaying journal... Aug 6 05:01:21 s12n03 kernel: GFS: fsid=s12:scratch13.0: jid=2: Replayed 10050 of 11671 blocks Aug 6 05:01:21 s12n03 kernel: GFS: fsid=s12:scratch13.0: jid=2: replays = 10050, skips = 472, sames = 1149 Aug 6 05:01:21 s12n03 kernel: GFS: fsid=s12:scratch13.0: jid=2: Journal replayed in 7s
Aug  6 05:01:21 s12n03 kernel: GFS: fsid=s12:scratch13.0: jid=2: Done
Aug  6 05:01:21 s12n01 kernel: lock_dlm: withdraw abandoned memory
Aug  6 05:01:21 s12n01 kernel: GFS: fsid=s12:scratch13.2: withdrawn
Aug  6 05:01:21 s12n01 kernel:   mh_magic = 0x01161970
Aug  6 05:01:22 s12n01 kernel:   mh_type = 4
Aug  6 05:01:22 s12n01 kernel:   mh_generation = 133
Aug  6 05:01:22 s12n01 kernel:   mh_format = 400
Aug  6 05:01:22 s12n01 kernel:   mh_incarn = 0
Aug  6 05:01:22 s12n01 kernel:   no_formal_ino = 4918461516
Aug  6 05:01:22 s12n01 kernel:   no_addr = 4918461516
Aug  6 05:01:22 s12n01 kernel:   di_mode = 0664
Aug  6 05:01:22 s12n01 kernel:   di_uid = 690
Aug  6 05:01:22 s12n01 kernel:   di_gid = 2017
Aug  6 05:01:22 s12n01 kernel:   di_nlink = 0
Aug  6 05:01:22 s12n01 kernel:   di_size = 0
Aug  6 05:01:22 s12n01 kernel:   di_blocks = 119
Aug  6 05:01:22 s12n01 kernel:   di_atime = 1248334920
Aug  6 05:01:22 s12n01 kernel:   di_mtime = 1249549274
Aug  6 05:01:22 s12n01 kernel:   di_ctime = 1249549274
Aug  6 05:01:22 s12n01 kernel:   di_major = 0
Aug  6 05:01:22 s12n01 kernel:   di_minor = 0
Aug  6 05:01:22 s12n01 kernel:   di_rgrp = 4918433973
Aug  6 05:01:22 s12n01 kernel:   di_goal_rgrp = 4918433973
Aug  6 05:01:22 s12n01 kernel:   di_goal_dblk = 27528
Aug  6 05:01:22 s12n01 kernel:   di_goal_mblk = 27528
Aug  6 05:01:22 s12n01 kernel:   di_flags = 0x00000000
Aug  6 05:01:22 s12n01 kernel:   di_payload_format = 0
Aug  6 05:01:22 s12n01 kernel:   di_type = 1
Aug  6 05:01:22 s12n01 kernel:   di_height = 0
Aug  6 05:01:22 s12n01 kernel:   di_incarn = 0
Aug  6 05:01:22 s12n01 kernel:   di_pad = 0
Aug  6 05:01:22 s12n01 kernel:   di_depth = 0
Aug  6 05:01:22 s12n01 kernel:   di_entries = 0
Aug  6 05:01:22 s12n01 kernel:   no_formal_ino = 0
Aug  6 05:01:22 s12n01 kernel:   no_addr = 0
Aug  6 05:01:22 s12n01 kernel:   di_eattr = 0
Aug  6 05:01:22 s12n01 kernel:   di_reserved =
Aug 6 05:01:22 s12n01 kernel: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Aug  6 05:01:22 s12n01 last message repeated 2 times
Aug  6 05:01:22 s12n01 kernel: 00 00 00 00 00 00 00 00
Aug 6 05:01:35 s12n01 clurgmgrd: [6943]: <err> clusterfs:gfs- scratch13: Mount point is not accessible! Aug 6 05:01:35 s12n01 clurgmgrd[6943]: <notice> status on clusterfs:gfs-scratch13 returned 1 (generic error) Aug 6 05:01:35 s12n01 clurgmgrd[6943]: <notice> Stopping service scratch13 Aug 6 05:01:35 s12n01 clurgmgrd: [6943]: <info> Removing IPv4 address 10.14.12.5 from bond0 Aug 6 05:01:45 s12n01 clurgmgrd: [6943]: <err> /scratch13 is not a directory Aug 6 05:01:45 s12n01 clurgmgrd[6943]: <notice> stop on nfsclient:nfs- scratch13 returned 2 (invalid argument(s)) Aug 6 05:01:45 s12n01 clurgmgrd[6943]: <crit> #12: RG scratch13 failed to stop; intervention required Aug 6 05:01:45 s12n01 clurgmgrd[6943]: <notice> Service scratch13 is failed

I don't think that the FastT messages I've included caused the problem, since I've seen them at times without the file system crashing. It's not great to be getting those sorts of messages, but the file system didn't crash for another 10 minutes after that. I can't rule it out, though.

A bit later, I tried to disable the scratch13 service so I could work on its associated file system. The "clusvcadm -d" failed, as below, but the service was still disabled. Any thoughts?

Aug 6 05:50:10 s12n01 clurgmgrd[6943]: <notice> Stopping service scratch13 Aug 6 05:50:10 s12n01 clurgmgrd: [6943]: <err> /scratch13 is not a directory Aug 6 05:50:10 s12n01 clurgmgrd[6943]: <notice> stop on nfsclient:nfs- scratch13 returned 2 (invalid argument(s)) Aug 6 05:50:10 s12n01 clurgmgrd[6943]: <alert> Marking scratch13 as 'disabled', but some resources may still be allocated! Aug 6 05:50:10 s12n01 clurgmgrd[6943]: <notice> Service scratch13 is disabled

I've made it through pass1, 1b, 1c, and 2 in gfs_fsck, and I believe I'm in pass3 right now. I'm getting pages and pages of the following messages for the last few hours... but then, I always have, whenever I've needed to run gfs_fsck. I'm not too worried since I've seen this before, but I'd still like to understand it better. Could someone enlighten me as to their meaning, and if I should be more concerned?

Converting 366 unused metadata blocks to free data blocks...
Converting 192 unused metadata blocks to free data blocks...
Converting 88 unused metadata blocks to free data blocks...
Converting 87 unused metadata blocks to free data blocks...
Converting 681 unused metadata blocks to free data blocks...
Converting 339 unused metadata blocks to free data blocks...
Converting 256 unused metadata blocks to free data blocks...
Converting 441 unused metadata blocks to free data blocks...
Converting 375 unused metadata blocks to free data blocks...
Converting 315 unused metadata blocks to free data blocks...
Converting 173 unused metadata blocks to free data blocks...
Converting 118 unused metadata blocks to free data blocks...
Converting 69 unused metadata blocks to free data blocks...
Converting 396 unused metadata blocks to free data blocks...
Converting 331 unused metadata blocks to free data blocks...
Converting 397 unused metadata blocks to free data blocks...
Converting 275 unused metadata blocks to free data blocks...
Converting 439 unused metadata blocks to free data blocks...

Anyone have any insight they could share with me? It's been since December that I last had major problems with a GFS file system, but it's already happened three times this week. This storage cluster is running CentOS 4.6 and GFS1.

Thanks,

James


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]