[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: ext3 filesystem is not recognized

Given what  you've described, then only drive that it would make sense to pull out would be the one that was dropped and then re-inserted.

On Jan 4, 2008 10:20 PM, Dennison Williams < evoltech 2inches com> wrote:
> Did you try and re-insert the kicked-out drive as if it was clean, or did
> you try to re-sync it to the existing filesystem.  If the former, then
> that's a HUGE mistake because the data on the drive is no longer in sync
> with what is on the other drives. (unless the entire filesystem was made
> read-only when (or before) the drive was dropped out.)

I re-inserted it with:
mdadm /dev/md0 --add /dev/sde
At which point it seemed to resync with the raid device (ie. the output
of /proc/mdstat showed that it was incrementally syncing)

> Check the SMART logs for each of the drives to see if they've had any
> problems.

there are messages like this:
/dev/sdc, failed to read SMART Attribute Data
...but this wasn't one of the disks that was removed from the raid device
If there are complaints about SDC, then I'd be inclined to do a long test of it
in smart. it's possible that the real problem started here.

A badblock read test (or just a dd if=/dev/sdc of=/dev/null) would also test the I/O path between the drive and the CPU. If there are complaints about that drive, then .. at this point, you should consider it suspicious.
> Try pullling the (candidate) compromized drive out of the array and see if
> the (degraded) filesystem works OK and has good data.  If it does, then I'd
> guess that the pulled drive had bad data written to it somehow --- re-add it
> (as if it was hot-swapped in), and hope it doesn't happen again.
> Try that with each of the  drives, in turn until you find the badly written
> drive.  If one of the drives has badly written data, the system really can't
> tell, for sure, which one is wrong.

I want to make sure I understand you here.  Say my raid device is
comprised of for devices /dev/md0 = /dev/sd[abcd], are you sugesting
that for each drive I do somthing like this:

mdadm /dev/md0 --fail /dev/sda --remove /dev/sda
Don't bother. If the drive got resynced, then pulling it won't do any good unless software RAID gets silently confused by random data on one plex,

then try to mount up the FS as usual to see if it is there?  Wouldn't
this point be moot if the device already re-assembled itself?
Yes. it would be moot.

> [[ unless the array was read-only when the drive was dropped, then you will
> only have any hope of good data with the dropped drive pulled ]]

It wasn't read-only, but nothing was writing to it.

Thanks for your time and prompt response.
Dennison Williams

Unless noatime was set, then the drive was being written to (if only atime data).  if all that got scrambled was atime data you should still have been able to mount the drive.

Stephen Samuel http://www.bcgreen.com
[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]