Advice for dealing with bad sectors on /
Steve Listopad
listopad at yahoo.com
Sun Jan 2 04:16:39 UTC 2005
All,
Comments in-line.
--- Larry McVoy <lm at bitmover.com> wrote:
> The one thing I'd add to Joseph's good advice is that when I see stuff like
> this (which I do, I manage a lot of Linux boxes) I tend to start swapping
> things. Put the drive in a known good system with a known good cable on
> the cable by itself and then see if you get errors. If you don't get
> errors in that situation it is likely your drive is fine and you have
> some bad hardware elsewhere.
>
> Hardware debugging is basically swapping parts until you find the guilty
> party.
Thanks to both you and Joseph for making me think about things that I simply
wouldn't have (or, at least not without first fixing something that wasn't
broke). I would have immediately suspected the hard drive, not cables or other
hardware. But, I guess that comes with experience, so thanks for sharing. The
first thing I'll do is make sure that the cables are secure, and swapping
cables as a quick test is easy enough to do. The controller is integrated into
the MB, so that would be more problematical :)
> On Sat, Jan 01, 2005 at 01:28:39PM -0600, Joseph D. Wagner wrote:
> > > Getting errors similar to:
> > >
> > > Dec 31 20:44:30 mybox kernel: hdb: dma_intr: status=0x51 { DriveReady
> > > SeekComplete Error }
> > > Dec 31 20:44:30 mybox kernel: hdb: dma_intr: error=0x40 {
> > > UncorrectableError },
> > > LBAsect=163423, high=0, low=163423, sector=163360
> > > Dec 31 20:44:30 mybox kernel: end_request: I/O error, dev 03:41 (hdb),
> > > sector
> > > 163360
> >
> > This may not be the disk; it could also be the controller. I've seen it go
> both ways. Any problems on hda?
No problems on hda. But, if it's the controller, that's built into the MB, so
that wouldn't be good.
I didn't just get DMA-type errors, there are others, like the one below. Can't
say that this is a complete list, though:
Dec 29 16:40:40 mybox kernel: hdb: read_intr: status=0x59 { DriveReady
SeekComplete DataRequest Error }
Dec 29 16:40:40 mybox kernel: hdb: read_intr: error=0x40 { UncorrectableError
}, LBAsect=163423, high=0, low=163423, sector=163360
Dec 29 16:40:40 mybox kernel: end_request: I/O error, dev 03:41 (hdb), sector
163360
> > Try adding ide=nodma to the kernel parameters. If the problem goes away,
> the problem is in the kernel driver for the controller or motherboard
> chipset.
Excellent sugguestion. Will give that a try.
> > > When I rebooted, the system threw me into a shell, to get me to "fix"
> > > things. So, I did an e2fsck -c -v /dev/hdb1 to attempt to fix things.
> > > The badblocks checking took 20 hours (it's a 200GB disk). Then I went
> > > through the question/answer session, hoping to get through the
> problems...
> >
> > A better way to go about this is booting off the rescue CD and doing the
> e2fsck scan there. Otherwise, there could be leftover problems from running
> the scan off of the partition you are scanning.
Ah, now that advice is somthing to put in my back pocket to remember. Never
gave that a thought, since it "booted enough" to get me to a shell prompt to
run e2fsck. Guess I wasn't forced to think "Rescue CD".
> > > Some questions:
> >
> > Best advice to all 3 questions: get some sort of disk imaging software.
> >
> > The disk imaging software may copy the bad sectors (i.e. sectors marked bad
> now may also be marked bad on the new drive), but you can force e2fsck to
> rescan bad sectors.
> >
[SNIP]
Thanks a million for all of the advice. I really appreciate it! About to go
try these suggestions.
Steve
__________________________________
Do you Yahoo!?
Take Yahoo! Mail with you! Get it on your mobile phone.
http://mobile.yahoo.com/maildemo
More information about the Ext3-users
mailing list