[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Policy after fsck fixes errors



On Wed, 19 Jan 2005 08:56:53 -0500
"Theodore Ts'o" <tytso mit edu> wrote:

> On Wed, Jan 19, 2005 at 09:16:37AM +0000, Geoff wrote:
> > 
> > Yesterday evening this box crashed and, for once, ext3
> > was not able to recover automatically.  There was an
> > "unexpected inconsistency inode xxxxx has imagic flag
> > set" error and the system would not boot.  I fscked from
> > my rescue disk and I guess 20 or 30 errors were fixed by
> > me just selecting "y" at the prompt.
> 
> Did you save all of the other fsck messages?  Was there
> any messages about directory entries to deleted files
> being deleted, or files ending up in lost+found?

Thanks for responding Ted.

Why do problems like this always occur late at night?
I use the box for my business and went into idiot-mode just
to try and get it working again before bed and today's work.
 I did not therefore, do any logging or even pay proper
attention to fsck's output.  I do remember that fsck took at
least two passes and ISTR that, duplicates were found and
stuff was deleted.

After I had posted to the list this morning I got my
thinking apparatus into better order and remembered to check
lost+found.  There are two directories there, both of which
appear to be empty.  I noted the username which had written
them and did a system-wide find for files created by that
user - there was just one other file (cert8.db), in the copy
of the broken firefox profile directory I saved before
creating the new one - so I guess that at some point fsck
must have asked me if I wanted it to do that.   This
encourages me to believe that the situation may not be as
bad as I feared.  I have decided to wait and see at least
until weekend when I shall have a little more time.

<snip> 

> ... there would be no further filesystem corruption
> (unless whatever caused garbage to appear in the first
> place --- bad controller, bad IDE/SCSI cable, failing hard
> drive, etc. --- is still causing additional damage to the
> filesystem).

This old smp box is generally good for my purposes, but it
has long had a tendency to the occasional unexplained
self-reset.  It was one of these that sent the system down
last evening.  As I said in my original post, ext3 has
always coped seamlessly with these before, but maybe last
night the system was in a particularly vulnerable state.
On the other hand maybe it is a new hdd problem. 
Someone has suggested by private email that I should run the
diagnostics provided by smartmontools.  I will do that
at weekend when I have a little more time.

> 
> However, files might be missing.  
> 

> It might be worthwhile to compare the files on your backup
> hdd with your current filesystem, to see if you have any
> missing files.

Yes, I will do that.

> It's also possible I suppose that data
> blocks were actually corrupted, but that really depends on
> how the inode table blocks got corrupted in the frist
> place.
> 

This is what really troubles me.  If the problem is in a
library or some seldom-used but essential utility then I am
going to be in trouble when I am not expecting it.

I really should bring my backup up-to-date in any case. 
If smartmontools don't reveal any hardware problems then the
best strategy must be to substitute the backup for my
current, drive do the update, then copy back to the the
current drive.

> > (b) Are there any other diagnostics I could run to
> > confirm the integrity of my data?
> 
> Aside from comparisons to backups, if you had been using
> programs like cfv to calculate checksums of all of your
> files, you could use them to check the data integrity of
> your data files.  
> 
> On rpm-based systems, you can also do things like rpm -Va
> to check the checksums of the installed files.  This won't
> check your data files, and will raise false positives
> caused by config files that you have since modified, but
> it might give you a hint that there is greater damage, or
> put your mind at ease that things are more likely to be
> OK.
> 
I am running slack 9.1 with a vanilla 2.10 kernel and
compile just about all of my applications locally.  I guess
I am pretty much on my own therefore.

> Good luck,

Thanks again,

Geoff


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]