The every day things from Thalamus' life.

Thalamus' Blog

19 February, 2009

Raid1 – revisited

Filed under: ComputerStuff_en — Thalamus @ 10:07

I feel I have to come back and make a note to myself about the procedures for the replacing of a failed software raided disk. Why a revisit ? Well, mostly to myself – in order to remember what I did and why.

I must stress, I do not take any responsibility for any faults that might come from following my “guide” – I’ve posted it here, mostly for myself – but, feel free to use/learn from it. ALWAYS have a complete backup of your data available before attempting any of this.

Normally, when you setup a machine with linux software raid, it will be installed with the bootblock only on the first harddrive. My problem was that this drive where out of sync and needed to be replaced. It contained outdated data and was impossible to just “fail” and “re-add” on order to sync it again.

In this example, the faulty disk is hda and the updated and OK disk is hdc. Machine still booted OK, since there where no problems with the boot block off hda itself, but the goal was to replace that drive.

I booted the machine off the first CD of CentOS, and issued “linux rescue” at the startup. When it came to the question for seeking for old version of linux in order to mount it as /mnt/sysimage – I choose “skip”, going straight to the bash prompt.

First, I copied the bootblock, and partition table from hda to hdc.

# dd if=/dev/hda of=/dev/hdc bs=512 count=1

I could do this without any problem since these where two identical drives which already had equal partition tables. Then it was time to shutdown and replace the fautly disk (hda). I also moved the drive off hdc to the hda position. Why ? Grub installed was setup to boot from the primary disk of the first controller (hda), not the 1 primary of the second controller (hdc)

Short note about mbr/partition table:

# dd if=/dev/sdX of=/tmp/sda-mbr.bin bs=512 count=1,

makes a copy of your mbr AND partition table to the file “/tmp/sda-mbr.bin”
IF you want to restore, pr. example only the partition table it is done by this command.

# dd if= sda-mbr.bin of=/dev/sdX bs=1 count=64 skip=446 seek=446.

restoring only mbr would be.

# dd if= sda-mbr.bin of=/dev/sdX count=1 bs=446.

and of course, restoring both the mbr and partition table.

# dd if= sda-mbr.bin of=/dev/sdX count=1 bs=512.

So, now after replacing and swapping around the disks, I once again booted off the CentOS #1 CD and issued “linux rescue”, and went once more straight to the bash prompt.

The situation now was that I had a hda (that used to be hdc), as the boot disk, and hdc which contained no data at all. The first thing now was to populate the new disk with the partition table of the old disk.

# sfdisk -d /dev/hda | sfdisk /dev/hdc

“sfdisk -d” list the disk partition information in a format that sfdisk can read in order to issue it to a new drive. So, this is what is done here. Just copying the partition table of hda to hdc.

Just to be very safe, I choose to format the new disk and copy all the data off the hda to hdc. Its an extra step, I admit. But, well better safe than sorry. It would be really annoying if I where to copy the empty disk over to the working one once I started up the mirror.

So, to do this I issued these commands.

# mkfs.ext3 /dev/hdc1
# mkswap /dev/hdc2
# mkfs.ext3 /dev/hdc3
# mkdir /tmp/boot_a /tmp/root_a /tmp/boot_c /tmp/root_c
# mount /dev/hda1 /tmp/boot_a
# mount /dev/hda3 /tmp/root_a
# mount /dev/hdc1 /tmp/boot_c
# mount /dev/hdc3 /tmp/root_c
# cp -a /tmp/boot_a/* /tmp/boot_c/.
# cp -a /tmp/root_a/* /tmp/root_c/.

So, now I needed to setup the raid again for the new disk. I told mdadm to create the array initially using only the new drive – yes, I know – thats why I said the above ain’t really needed. But, still – better safe than sorry.

mdadm --create /dev/md0 --level 1 --raid-devices=2 missing /dev/hdc1
mdadm --create /dev/md1 --level 1 --raid-devices=2 missing /dev/hdc3


mdadm --add /dev/md0 /dev/hda1
mdadm --add /dev/md1 /dev/hda3

… now, it is time to lean back and wait for the sync to complete. Then after this, do a consistency of the new /dev/mdX devices and resize them because of the fact that the sizes of the partitions have changed slightly. Failing to do so, might send you in an infinite loop of complains about corrupted superblocks or partition table fails.

e2fsck -f /dev/md0
e2fsck -f /dev/md1
resize2fs /dev/md0
resize2fs /dev/md1


“e2fsck -f” might ask you a question where you will be tempted to answer “no” to. Don’t !

Now … reboot and hopefully, everything is up and running without any hitches.

• • •