Recover ubuntu box from hack, part 8: Reconfigure RAID5 array with mdadm

So this time I’m going to retell my experience of the simple task of reconfiguring the RAID 5 array. This I already did from the live CD, when I was making sure I had all of my things still, so this was going to be fairly straightforward. All I had to do was install mdadm and set up the configuration file. So the first step:

sudo aptitude install mdadm

Excellent. Now I find the configuration already exists in the /etc/mdadm/mdadm.conf file, as mdadm found the RAID partitions already. However the RAID wouldn’t start:

sudo mdadm --assemble /dev/md0
mdadm: /dev/sdc1 has no superblock - assembly aborted

“Hmm”, I thought. “Maybe I’m doing something wrong. OK – I’ll just get the old /etc/mdadm/mdadm.conf file from the old installation. Where did I put that backup?” *sigh* The backup was a disk image stored on the RAID array. Cool. So no choice but to work out how to use mdadm properly.

Then I tried

sudo mdadm --assemble /dev/md0 --verbose /dev/sda2 /dev/sdb2 /dev/sdc1

But mdadm reported that it had started the array with only 2 of the 3 arrays. Why the hell is it doing that? I wonder.

Ok. So did some research along with a mate from work. Turns out there’s actually a bug in Linux that causes the kernel to do something stupid preventing mdadm from mounting RAIDs. I find the work around on that page which is to mount the offending partition to a loop device and then use that in the array instead:

sudo losetup /dev/loop0 /dev/sdc1
sudo mdadm -A /dev/md0 /dev/sda2 /dev/sdb2 /dev/loop0

This worked fine. But this was hardly a good solution. More research to be done.

Eventually I found that /dev/sdc1 – which was one of the RAID partitions – was being claimed by an array called /dev/md_d127, thus holding the drive from being assembled in another array. This I confirmedby checking /proc/mdstat. Simply running

sudo mdadm -S /dev/md_d127

to stop the unknown array allowed me to then run the correct RAID assemble command. So now I had a working RAID again. Rebooted and checked – sure enough the unknown array is running again, and mine won’t assemble. So I stop the unknown one again, and assemble mine. I create a directory to mount and then mounted the RAID to it:

sudo mkdir /media/share
sudo mount /dev/md0 /media/share

Then I go and mount the image from the share to another new directory:

sudo mkdir /media/oldmachine
sudo mount -o loop /media/share/public/image /media/oldmachine

I get the mdadm.conf file from that and overwrite my own. Restart the machine and nothing at all happens differently. *le sigh*.

After another couple of hours of playing around, another friend comes online. I ask him if he’s had experience with mdadm before. He has, and he has his own RAID array going currently. Excellent. I explain the situation and he asks me what type of the RAID partitions are:

sudo fdisk -l /dev/sda

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0000dda4

 Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        6079    48827392   83  Linux
/dev/sda2            6080      243201  1904682465   fd  Linux RAID autodetect

He goes on to tell me that ‘fd’ will be automatically, and often incorrectly, auto mounted by the kernel – I need to change it to type ’83’ which is for Linux file systems like ext2,3,4 etc. That way the OS won’t auto mount it and give mdadm a chance to.

So I use fdisk to change the partition types, so with each disk I do:

sudo fdisk /dev/sda
... 
Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): 83
Command (m for help): w

and so on with the other disks. ‘t’ specifies I need to change the type, I choose partition 2 as its /dev/sda2 I need to change, then 83 is the type for ‘Linux’, then w writes the changes. Then it says something about not being able to write the changes as the disk is in use at the moment. So once I’ve done that to all the drives I reboot.

I check /proc/mdstat to check if it’s solved the problem. Nope. After 15 minutes of wondering what to do now, I reread the chat with my friend – After changing the partitions I need to reconfigure mdadm:

sudo dpkg-reconfigure mdadm

Then I reboot and its running correctly. Then I just edit my fstab file to include the RAID:

sudo echo '/dev/md0    /media/share    ext4            ' >> /etc/fstab
sudo mount -a

Check the mount and its working. Rebooted and checked again – still fine. Awesome. A whole day on this stupid mess, which took me all of 30 minutes from a live CD, simply because it just worked that time.

 

This post just goes to show the trade off of solving the problem yourself versus asking someone who knows: Sure I learned a fair bit in the several hours I took researching the issue, but the most relevant stuff I learned was during the 15 minutes it took to solve it when talking to someone who knows.

 

Advertisements
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: