RAID drive failed, but SMART shows no errors?

Sam Varshavchik mrsam at courier-mta.com
Sat Mar 17 18:17:23 UTC 2007


Mogens Kjaer writes:

> 2. Remove the drive from the kernel:
> 
> echo "scsi remove-single-device 0 0 0 0" >/proc/scsi/scsi

Thanks for that tip.  I note that the proc man page does not document this 
command, only add-single-device is documented.  I was wondering about how 
the kernel need to be notified about a removed device.

Now that I know the required voodoo on the Linux side of things, the actual 
procedure turned out to be surprisingly painless.  Fail, then remove all 
partitions, remove the device, cut the power to the hot-swap bay, pull the 
case out, unscrew it, remove the drive, insert the new drive, cover it back 
up, put the case back into the hotswap bay, turn the power on, wait for the 
disk to spin up, go back to the console, add the device, set up a new 
partition table, and add all partitions back to the RAID volumes.  Took me 
less than 10 minutes.  End result: replaced a failing drive, no loss of 
uptime.  I'll still need to verify that "/sbin/grub-install /dev/sda" makes 
the new disk bootable.

> echo "scsi add-single-device 0 0 0 0" >/proc/scsi/scsi
> 
> This can take awhile, the drive has to spin up.

Actually, at least my new Seagate Cheetah spun up as soon as it received 
power from the hotswap bay.  It was ready in about 7 seconds, and there was 
no subsequent delay doing add-single-device.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/fedora-list/attachments/20070317/67c211d0/attachment-0001.sig>


More information about the fedora-list mailing list