[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: ext3 filesystem corruption - more info




Since it seemed to mount okay only 3mins earlier,
can we assume that it was initially uncorrupted ?
Or, is that not valid assumption ?

Is there anything that we can check, test etc...
any advice, action at this point is better than waiting for the next fileystem disaster to ocurr.

Thanks
-Sev

Andreas Dilger wrote:
On Apr 12, 2006  19:28 -0400, Sev Binello wrote:
[HTML-only email] - it would be preferred if you used plain text, or at
least multipart/mixed for your email to this list...

  
//soon as nfs clients start get a TON of errors like this
Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
ext3_free_blocks: Freeing blocks not in datazone - block = 3443589120, count = 1
Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
ext3_free_blocks: Freeing blocks not in datazone - block = 2113834232, count = 1
Mar 26 00:07:22 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
ext3_free_blocks: bit already cleared for block 49125
    

  
//interspersed with some of these
Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1891463980, limit=1722264358
Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1824250576, limit=1722264358
Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
    

These indicate that the kernel ext3 code detected serious corruption of the
metadata on the filesystem.  In cases like this, if the filesystem doesn't
remount readonly (i.e. mounted with "-o errors=remount-ro") then it just
makes the corruption progressively worse.

It doesn't point to a root cause, however.

  
Would it be a problem if the two 1.8TB systems appeared on one host?
    

No, some of our customers have hundreds of systems with two ext3 filesystems
of about this size, running on 2.4.21-RHEL3 kernels.  The LUNs exported from
the RAID storage are all under 2TB.  They have never reported similar problems
over several years of usage.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

  


-- 

Sev Binello
Brookhaven National Laboratory
Upton, New York
631-344-5647
sev bnl gov

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]