Posts Tagged ‘ RAID ’

Recover ubuntu box from hack, part 11: Ensuring sane behaviour when a drive dies

So I would prefer, since it is capable of it, that mdadm automatically email me when my RAID looks like it might fail. I found the command to test if its working:

mdadm --monitor -1 -m lynden /dev/md0 -t

The monitor flag puts it into check mode, the -1 tells it just run once the -m specifies the email address (here a user on the local system) , then the md array to check is specified then -t tells it to send a TestEvent. A few minutes after I did that and my CLI told me I have mail. I checked it with:

cat /var/mail/lynden

This gave me raw email format. I decided I needed to find a CLI based email client. Did some searching and found one that sounded good:

sudo aptitude install mutt

Works excellently. Then I set the  address in /etc/mdadm/mdadm.conf to my username at my server. Then I did test without specifying recipient:

mdadm --monitor -1 -m /dev/md0 -t

after playing around for a couple of minutes, bash told me i have mail in /var/mail/lynden. Opened mutt and sure enough its there.  Everything should be swell now. Now to turn off my computer, pull out a drive, and see how it reacts.

Damn. It didn’t boot. Sat at booting screen saying the md array could not be started. Keep waiting, S to skip, or M for manual recovery. OK back to do some more searching. Ok. Skipped both the mdadm failure and the resultant mount failure. From the CLI I tried to mount the array. It came back saying it assembled the array with 1 drive. After some investigation, it seems the drive I removed (was) the middle one of the array, whereas I thought it was the 3rd disk. The arrays on the drives aren’t uniform (i.e. the partitions used are sda2, sdb2 and sdc1, so pulling out /dev/sdb resulted in sdc being named as /dev/sdb. Since, in the mdadm.conf file, I specify exactly which partitions it should use, it is looking for sdb2, and sdc1, both of which do not exist at this point. So I just removed that specifying line and uncommented the original line “DEVICE partitions” to allow mdadm examine all partitions itself.  Then tried to assemble the array again, and it then assembled correctly with 2 of 3 devices. So now  it will assemble. And hey, mail in my inbox! Indeed, it was mdadm reporting an actual degraded array.

So Rebooted again but it still failed. Seems mdadm just won’t automatically boot with a degraded array. And this is precisely the problem: after further research this is intended behaviour, but behaviour which can be changed correctly: there is a line in the file /etc/initramfs-tools/conf.d/mdadm which says “BOOT_DEGRADED=false” which I just need to change to “true”. Did this and rebooted and everything worked perfectly fine. Now was time to try it again with all 3 drives plugged back in. Once again it didn’t go as expected:  it wouldn’t boot again because error occurred mounting share (mount, not mdadm). Skipped mount and manually mounted it via CLI. Worked without error. Trying reboot again to see. Worked fine. Interesting. Wonder why that is? Must be something about auto mount remembering the exact configuration of the drive it is mounting from last time, and so mounting it manually (still only from fstab though) allowed it to then auto mount again from then on.


Recover ubuntu box from hack, part 9: Reconfigure samba file sharing

Before the hack I had set up a shared folder which is the main reason for having the RAID set up. I want to be able to share any media throughout the house as well as provide a place for people to store any data they might want to back up. To this end I had samba set up. So again I set it up:

sudo aptitude install samba

now I need to adjust the options:

sudo vim /etc/samba/smbd.conf

Uncommented the ;  security = user line. Set up a couple of shares. Reloaded the configuration:

sudo reload smbd.conf

Tested it. Works. One of my shares forces guest only, and one of them requires a log in. It took a very long time to work out how to log in. It kept saying “… Multiple connections to a server or shared resource by the same user, using more than one user name, are not allowed. …”. After doing a bit of research I found I could kill all current connections with a command in my windows command prompt:

net use * /delete

This kills all current network share connections, including any network drives. This was not a good solution and a bit more searching told me that its just how Windows works with network shares. That MS KB article suggested a workaround, which I will use, which is to log into \\server-name for one remote user account and \\server-IP for the other account.  This seemed to stop that error message from showing again, so long as I remember which user is for which address. This workaround is good enough, since it pretty much should only ever be me that needs it, and only when I’m testing things, or logging into my own private share from another user’s computer who has been logged into their private share. However I was still not able to come up with a working log in. A bit more research and stuffing around and I found out that the line of the configuration file about syncing with Unix passwords doesn’t actually mean it just checks the user-name and password against the server user-names and passwords. So I had to create a samba user-name and password:

sudo smbpasswd -a <username>

Once that was done I had to work out what domain it wanted. Seems it will only allow me to log on with workgroup\<username>. The workgroup used is the one specified in the smbd.conf file. I was sure I didn’t need to specify workgroup for my last set up, but I concede that I may have not set it up as properly last time.

Here is a dump of the useful parts of the smb.conf file – as given by the command ‘sudo testparm /etc/samba/smbd.conf’:

        server string = %h server (Samba, Ubuntu)
        map to guest = Bad User
        obey pam restrictions = Yes
        pam password change = Yes
        passwd program = /usr/bin/passwd %u
        passwd chat = *Enter\snew\s*\spassword:* %n\n *Retype\snew\s*\spassword:* %n\n *password\supdated\ssuccessfully* .        unix password sync = Yes        syslog = 0        log file = /var/log/samba/log.%m        max log size = 1000        dns proxy = No        usershare allow guests = Yes        panic action = /usr/share/samba/panic-action %d
        comment = All Printers
        path = /var/spool/samba
        create mask = 0700
        printable = Yes
        browseable = No
        browsable = No
        comment = Printer Drivers
        path = /var/lib/samba/printers
        comment = RAID public area
        path = /media/share/public
        guest only = Yes
        guest ok = Yes
[public - rw]
        comment = writable version of public share
        path = /media/share/public
        read only = No
        create mask = 0755
        directory mask = 0777
[guest area]
        comment = writeable area for guests
        path = /media/share/public/guest area
        read only = No
        create mask = 0755
        directory mask = 0777
        guest only = Yes
        guest ok = Yes
[lynden - private]
        comment = Lynden's personal stuff
        path = /media/share/private/lynden
        valid users = lynden
        read only = No
        create mask = 0700
        directory mask = 0700

The reason I have the main share set up with a read only main area, and rw version and a guest rw area is so that people who are visiting or get onto our network without permission can see what I have but cannot modify anything. House mates who want to use the RAID for storage can put things into the guest area, or if they want I’ll set up a user-name for them. The rw version of the main area is so that users I know and trust can modify the share if they want. However this security trades off the simplicity and ease of use of a straight public read/write network share, as my girlfriend would prefer.

I think it works a little better this time, and I definitely have a better understanding too.

Recover ubuntu box from hack, part 8: Reconfigure RAID5 array with mdadm

So this time I’m going to retell my experience of the simple task of reconfiguring the RAID 5 array. This I already did from the live CD, when I was making sure I had all of my things still, so this was going to be fairly straightforward. All I had to do was install mdadm and set up the configuration file. So the first step:

sudo aptitude install mdadm

Excellent. Now I find the configuration already exists in the /etc/mdadm/mdadm.conf file, as mdadm found the RAID partitions already. However the RAID wouldn’t start:

sudo mdadm --assemble /dev/md0
mdadm: /dev/sdc1 has no superblock - assembly aborted

“Hmm”, I thought. “Maybe I’m doing something wrong. OK – I’ll just get the old /etc/mdadm/mdadm.conf file from the old installation. Where did I put that backup?” *sigh* The backup was a disk image stored on the RAID array. Cool. So no choice but to work out how to use mdadm properly.

Then I tried

sudo mdadm --assemble /dev/md0 --verbose /dev/sda2 /dev/sdb2 /dev/sdc1

But mdadm reported that it had started the array with only 2 of the 3 arrays. Why the hell is it doing that? I wonder.

Ok. So did some research along with a mate from work. Turns out there’s actually a bug in Linux that causes the kernel to do something stupid preventing mdadm from mounting RAIDs. I find the work around on that page which is to mount the offending partition to a loop device and then use that in the array instead:

sudo losetup /dev/loop0 /dev/sdc1
sudo mdadm -A /dev/md0 /dev/sda2 /dev/sdb2 /dev/loop0

This worked fine. But this was hardly a good solution. More research to be done.

Eventually I found that /dev/sdc1 – which was one of the RAID partitions – was being claimed by an array called /dev/md_d127, thus holding the drive from being assembled in another array. This I confirmedby checking /proc/mdstat. Simply running

sudo mdadm -S /dev/md_d127

to stop the unknown array allowed me to then run the correct RAID assemble command. So now I had a working RAID again. Rebooted and checked – sure enough the unknown array is running again, and mine won’t assemble. So I stop the unknown one again, and assemble mine. I create a directory to mount and then mounted the RAID to it:

sudo mkdir /media/share
sudo mount /dev/md0 /media/share

Then I go and mount the image from the share to another new directory:

sudo mkdir /media/oldmachine
sudo mount -o loop /media/share/public/image /media/oldmachine

I get the mdadm.conf file from that and overwrite my own. Restart the machine and nothing at all happens differently. *le sigh*.

After another couple of hours of playing around, another friend comes online. I ask him if he’s had experience with mdadm before. He has, and he has his own RAID array going currently. Excellent. I explain the situation and he asks me what type of the RAID partitions are:

sudo fdisk -l /dev/sda

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0000dda4

 Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        6079    48827392   83  Linux
/dev/sda2            6080      243201  1904682465   fd  Linux RAID autodetect

He goes on to tell me that ‘fd’ will be automatically, and often incorrectly, auto mounted by the kernel – I need to change it to type ’83’ which is for Linux file systems like ext2,3,4 etc. That way the OS won’t auto mount it and give mdadm a chance to.

So I use fdisk to change the partition types, so with each disk I do:

sudo fdisk /dev/sda
Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): 83
Command (m for help): w

and so on with the other disks. ‘t’ specifies I need to change the type, I choose partition 2 as its /dev/sda2 I need to change, then 83 is the type for ‘Linux’, then w writes the changes. Then it says something about not being able to write the changes as the disk is in use at the moment. So once I’ve done that to all the drives I reboot.

I check /proc/mdstat to check if it’s solved the problem. Nope. After 15 minutes of wondering what to do now, I reread the chat with my friend – After changing the partitions I need to reconfigure mdadm:

sudo dpkg-reconfigure mdadm

Then I reboot and its running correctly. Then I just edit my fstab file to include the RAID:

sudo echo '/dev/md0    /media/share    ext4            ' >> /etc/fstab
sudo mount -a

Check the mount and its working. Rebooted and checked again – still fine. Awesome. A whole day on this stupid mess, which took me all of 30 minutes from a live CD, simply because it just worked that time.


This post just goes to show the trade off of solving the problem yourself versus asking someone who knows: Sure I learned a fair bit in the several hours I took researching the issue, but the most relevant stuff I learned was during the 15 minutes it took to solve it when talking to someone who knows.


%d bloggers like this: