Replacing failed raid (boot) disk

Mark msalists at gmx.net
Thu Jan 19 00:16:06 UTC 2006


Actually, I just thought of something:
Would it be easier to copy the boot partition from the mirror server on to the unused partition of the good drive before replacing
the bad drive?

Here is how the drives are partitioned right now (SDA is the bad drive that needs to be replaced):
sda1 -> /boot
sda2 -> raid
sda3 -> raid1 (md0)

sdb1 -> swap
sdb2 -> raid1 (md0)
sdb3 -> unused (the counterpart of sda1)

BTW, these are SATA drives, in case it matters...

The good drive ad the bad drive have identical partitions, however the order is different. I did not do this intentionally, I tried
to keep the order the same, but DiskDruid kept switching around the partitions of sdb on me.

Could I use sdb as sda, or would this not work, since /boot would then be on sda3, rather than sda1?

If I could switch them around I could save the part with the rescue disk and do something like this:

1. Copy the content of the second server's /boot partition to sdb3
2. change /etc/fstab so that /boot is on sda3 rather than sda1
3. ?? Where do I define which partitions make up md0?
4. Install boot loader onto good disk
5. Shut down, replace bad sda drive with good sdb drive, plug new replacement into where sdb used to be.
6. Boot (from sda, previously sdb), partition sdb, and get mdadm to resync md0 onto the new drive.

This way I would have less downtime, since I do not need to run in rescue mode.

I would still have the same problems in step 4 that I had with the first version, of course.

Thanks,

MARK


> -----Original Message-----
> From: fedora-list-bounces at redhat.com 
> [mailto:fedora-list-bounces at redhat.com] On Behalf Of Mark
> Sent: Wednesday, January 18, 2006 3:54 PM
> To: fedora-list at redhat.com
> Subject: Replacing failed raid (boot) disk
> 
> 
> Hi everybody,
> 
> I just got this log output a few days ago:
> Jan 11 15:34:24 webserv1 kernel: ata1: status=0x51 { 
> DriveReady SeekComplete Error } Jan 11 15:34:24 webserv1 
> kernel: ata1: error=0x10 { SectorIdNotFound } Jan 11 15:34:29 
> webserv1 kernel: ata1: status=0x51 { DriveReady SeekComplete 
> Error } Jan 11 15:34:29 webserv1 kernel: ata1: error=0x10 { 
> SectorIdNotFound } Jan 11 15:34:59 webserv1 kernel: ata1: 
> command 0xc8 timeout, stat 0x51 host_stat 0x61 Jan 11 
> 15:34:59 webserv1 kernel: ata1: status=0x51 { DriveReady 
> SeekComplete Error } Jan 11 15:34:59 webserv1 kernel: ata1: 
> error=0x10 { SectorIdNotFound } Jan 11 15:34:59 webserv1 
> kernel: SCSI error : <0 0 0 0> return code = 0x8000002 Jan 11 
> 15:34:59 webserv1 kernel: sda: Current: sense key: Aborted Command
> Jan 11 15:34:59 webserv1 kernel:     Additional sense: 
> Recorded entity not found
> Jan 11 15:34:59 webserv1 kernel: end_request: I/O error, dev 
> sda, sector 11217554 Jan 11 15:34:59 webserv1 kernel: raid1: 
> Disk failure on sda3, disabling device.
> Jan 11 15:34:59 webserv1 kernel:        Operation continuing 
> on 1 devices
> Jan 11 15:34:59 webserv1 kernel: raid1: sda3: rescheduling 
> sector 6815744 Jan 11 15:34:59 webserv1 kernel: raid1: sdb2: 
> redirecting sector 6815744 to another mirror Jan 11 15:34:59 
> webserv1 kernel: RAID1 conf printout: Jan 11 15:34:59 
> webserv1 kernel:  --- wd:1 rd:2 Jan 11 15:34:59 webserv1 
> kernel:  disk 0, wo:1, o:0, dev:sda3 Jan 11 15:34:59 webserv1 
> kernel:  disk 1, wo:0, o:1, dev:sdb2 Jan 11 15:34:59 webserv1 
> kernel: RAID1 conf printout: Jan 11 15:34:59 webserv1 kernel: 
>  --- wd:1 rd:2 Jan 11 15:34:59 webserv1 kernel:  disk 1, 
> wo:0, o:1, dev:sdb2
> 
> 
> This is on a server with an unraided /boot on sda1 and a 
> software-raid1 raided / partition
> 
> Dell says the HD needs to be replaced, so now I got the 
> replacement hard disk. The problem is: the failed disk is the 
> one I boot from and the boot partition is not mirrored. So I 
> can not copy the content of the boot partition, nor get the 
> fdisk information to partition the new disk the same way as 
> the old one What is the best and easiest way to get the new 
> system up and running as painlessly as possible?
> 
> I have a second machine with an identical setup, so I guess I 
> could get the info from that box.
> 
> I am thinking I need to:
> 1. Plug the new disk in and boot from the rescue CD
> 2. Look up the partition info on the mirror box and partition 
> the new disk accordingly. 3. Copy the content of the boot 
> partition over from the mirrored box 4. install grub on sda 
> (how!?!?!?) 5. Hopefully boot the machine with the replaced 
> HD and hope that mdadm will automatically start synching the 
> raid from the good raid disk (sdb)
> 
> The problem is mainly step 4: I am not sure what I had picked 
> as boot loader location from the "Advanced Boot Loader 
> Configuration" screen ("MBR vs. first sector of boot 
> partition). So I need to figure out
>  a) what the location was, and
>  b) how to get the boot loader installed there manually (I've 
> always just used the automated install for the boot loader).
> 
> 
> Is my assumption about steps 1-5 correct?
> Does anybody have any hints regarding how to do step 4?
> 
> And then for the future: how can I be better prepared for 
> this next time? Is there a way to capture the partition and 
> boot loader information (at a point before the disk actually 
> goes bad) and then restore it to an identical drive in a more 
> automated fashion?
> 
> Thanks,
> 
> MARK 
> 
> 
> -- 
> fedora-list mailing list
> fedora-list at redhat.com
> To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
> 




More information about the fedora-list mailing list