File System Errors

Benjamin Franz snowhare at nihongo.org
Mon Jan 9 15:32:48 UTC 2006


On Mon, 9 Jan 2006, Timothy A. Holmes wrote:
[...]

> permission, and change them, and received notification that my
> filesystem was read only.  This is an FC3 box running an ext3 file
> system.  I thought this to be extremely suspicious, so I went to the
> console to investigate.  When I got there, the screen was full of
> scrolling notices that there had been a journal exception.
>
>
>
> I rebooted the system, and when it came back telling me that the file
> system had been uncleanly unmounted, I forced the file system integrity
> check.  It promptly kicked me out to a shell, and told me to run the
> fsck command, which I did,  Its now running the command.
>
>
>
> My question has four parts:
>
>
>
> 1.	What happened?

As I am *right now* handling a similar problem I'm in a good position to 
to answer: Your disk system probably as a bad problem like a bad sector 
(in my case it is a bad sector) and the kernel remounted your partition 
read-only to keep things from getting worse.

> 2.	What can I do to prevent it from happening again (this is the
> second time in 6 months that this has happened on this box)?

Run a e2fsck on the disk with a '-c' option set (read only test) to find 
any bad sectors. Secondly, replace the drive. What that entails depends 
largely on how your system is configured (number of drives, using/not 
using RAID, partitioning, etc). Personally, I would copy all my data 
somewhere else, strip the machine down to metal and rebuild and then copy 
my data back. Whether that is an option for you depends on how critical 
downtime is for you.

> 3.	what else do I need to do to remedy it

See 2. Also, when you rebuild, you should think about using RAID1 to make 
you more resistant to drive failures. Consider the cost of the time and 
downtime involved in a system recovery and you will probably conclude that 
a second harddisk is cheap.

> 4.	is there any way that I could have known about this earlier, I
> interacted with the box last on Saturday afternoon, and it appeared
> happy then.

Run the smartd monitoring daemon to get advance notification of failing 
drives. (I actually knew I had a developing bad drive - I just have been 
too lazy to replace it on my own system before today. Later today I am 
going to strip the machine down and re-install Linux on RAID1.)

-- 
Benjamin Franz

The designer of a new kind of system must participate fully in the implementation.

                                                          - Donald E. Knuth




More information about the fedora-list mailing list