Remove `mdadm` RAID1 while keeping data

August 1, 2021 raid linux

Guide on removing a Linux mdadm RAID1 array while preserving existing partition data, avoiding the need to reinstall or copy files around.

If you ever end up with a similar need, here is how I did it. It goes without saying that you should back up data before doing anything, otherwise you accept the risk of losing it all. Hic sunt dracones 🐉

Context

A dedicated SoYouStart server with two disks, for hosting virtual machines via VMware ESXi. Unfortunately VMware ESXi does not support software RAID: as a workaround, it is possible to attach two VMDK (virtual disk) to each VM, with each VMDK stored on a separate datastore (hardware disk), and setup software RAID1 directly from the VM via mdadm.

This annoying limitation actually was one of the main motivators for migrating from VMware ESXi to Proxmox VE, which supports software RAID1 over ZFS. With RAID handled at the host level, virtual machines do not need to manage software RAID themselves anymore, and only one disk needs to be kept for each VM.

Lazy and easy

We are going to remove devices from the array, then trick mdadm into being cool with a RAID1 array consisting of only one device.

First, inspect partitions and disks to identify where is what and what needs to be done:

Which partitions on which disks make up the mdadm array.
Decide which partition is going to stay.
If there are other partitions on the same disks AND we want to ditch the disk, decide if they can be deleted or should be moved.

# lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda       8:0    0  100G  0 disk
├─sda1    8:1    0  953M  0 part  [SWAP]
└─sda2    8:2    0 99.1G  0 part
  └─md0   9:0    0 99.1G  0 raid1 /
sdb       8:16   0  100G  0 disk
├─sdb1    8:17   0  953M  0 part  /var/tmp
└─sdb2    8:18   0 99.1G  0 part
  └─md0   9:0    0 99.1G  0 raid1 /
# cat /proc/mdstat
md0 : active raid1 sda2[0] sdb2[1]
      103872512 blocks super 1.2 [2/2] [UU]
# mdadm --detail /dev/md0
State : active

For example, here:

We have one disk with a swap partition and the other one a /var/tmp partition.
The RAID1 partition md0 is spread over sda2 and sdb2.
We want to ditch the whole sda disk and keep sdb2.

Once everything is settled:

Do whatever needs to be done with non-array partitions.
Manually mark partitions to remove from the array as failed, then remove them from the array and delete their RAID superblock:

# mdadm /dev/md0 --fail /dev/sda2
mdadm: set /dev/sda2 faulty in /dev/md0
# mdadm /dev/md0 --remove /dev/sda2
mdadm: hot removed /dev/sda2 from /dev/md0
# mdadm --zero-superblock /dev/sda2

In our case, we first dropped the sda1 swap partition (not shown above: disabled with swapoff and an /etc/fstab update), and then removed sda2 from the array.

If we stopped at this stage mdadm would complain the array is degraded because it is missing a device:

# cat /proc/mdstat
md0 : active raid1 sdb2[1]
      103872512 blocks super 1.2 [2/1] [_U]
# mdadm --detail /dev/md0
State : clean, degraded

Fortunately, we can trick the array into being cool with one device so that it does not complain it is degraded:

# mdadm --grow /dev/md0 --force --raid-devices=1
raid_disks for /dev/md0 set to 1
# cat /proc/mdstat
md0 : active raid1 sdb2[1]
      103872512 blocks super 1.2 [1/1] [U]
# mdadm --detail /dev/md0
State : clean

If the removed partition / disk was also the boot partition / disk, make sure to update grub, initramfs and /etc/fstab as necessary:

# vi /etc/fstab
# grub-install /dev/sdb
# update-initramfs -u

In our case:

The RAID1 partition md0 is the boot partition but the bootloader was only installed on sda (check presence of GRUB with dd if=/dev/sda bs=512 count=1 2> /dev/null | strings), so we had to reinstall grub on the remaining sdb disk to ensure it can be booted from.
initramfs references the swap partition in its RESUME variable, so we had to remove it from /etc/initramfs-tools/conf.d/resume and update initramfs since the swap partition was removed.

The machine can now be shutdown, and the unused disk detached from the hardware. Update the VM disk boot order if necessary and boot: voilà!

# lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda       8:0    0  100G  0 disk
├─sda1    8:1    0  953M  0 part  /var/tmp
└─sda2    8:2    0 99.1G  0 part
  └─md0   9:0    0 99.1G  0 raid1 /

Harder but cleaner

The solution above works completely fine. However, if we’d like to go the extra mile and remove mdadm altogether, then we also need to fiddle with the remaining device: the rough idea is to rewrite the mdadm partition to strip out the RAID superblock while keeping the rest intact.

This sort of funky partition business is typically done from rescue mode, but in this article we’re going to do it directly from the live system for maximum thrill.

Inspect the current partitions’ UUIDs:

# blkid
/dev/sda2: UUID="<sda2_uuid>" TYPE="linux_raid_member"
/dev/md0: UUID="<md0_uuid>" TYPE="ext4"

Take note of both the linux_raid_member RAID partition (here sda2) and the underlying partition itself (here md0).

Inspect the remaining linux_raid_member partition and take note of the version and data offset:

# mdadm --examine /dev/sda2
    Version : 1.2
Data Offset : 16384 sectors

From the man page:

The different sub-versions store the superblock at different locations on the device, either at the end (for 1.0), at the start (for 1.1) or 4K from the start (for 1.2). “1” is equivalent to “1.2” (the commonly preferred 1.x format). “default” is equivalent to “1.2”.

Since we have version 1.2, everything before the data offset in the partition is the RAID superblock, everything after is the underlying partition.

Open fdisk on the disk:

# fdisk -u /dev/sda

From the fdisk prompt, print the partitions using p:

Command (m for help): p
Device     Boot   Start       End   Sectors   Size Id Type
/dev/sda1  *       2048   1953791   1951744   953M 83 Linux
/dev/sda2       1953792 209715199 207761408  99.1G fd Linux raid autodetect

Identify the RAID partition and take note of its start and end offsets.
Delete the partition using d and its partition number.
Create a new partition using n:
- Choose p for primary and reuse the same partition number.
- When prompted for the first sector, enter the sum of the start offset and data offset (16384 + 1953792 = 1970176 using the example above).
- For the last sector, enter the end offset as-is (here 209715199).
- Do not remove the filesystem signature.
Write the changes using w. This should position the new sda2 partition right in place of the underlying partition (formerly md0).
Reboot.

Spoiler: if the destroyed partition was the boot partition, then the jump scare moment is right now 😱

This is our case: we hit grub rescue because we destroyed the boot partition. To recover from that, we are going to manually point grub rescue to our brand new partition.

From the grub rescue prompt, display existing boot values with set:

grub rescue> set
prefix=(mduuid/<sda2_uuid>)/boot/grub
root=mduuid/<sda2_uuid>

As we can see, the issue is that grub points to the partition we just destroyed (<sda2_uuid>). Take note of the prefix filepath (here /boot/grub).

Run ls to display the existing partitions, named (hdX,msdosX) (MBR) or (hdX,gptX) (GPT):

grub rescue> ls
(hd0) (hd0,msdos2) (hd0,msdos1)

Run ls on each partition until we find the one containing the prefix filepath:

grub rescue> ls (hd0,msdos1)/boot/grub
error: file '/boot/grub' not found
grub rescue> ls (hd0,msdos2)/boot/grub
./ ../ unicode.pf2 i386-pc/ locale/ fonts/ grubenv grub.cfg

Once found, manually set boot values:

grub rescue> set prefix=(hd0,msdos2)/boot/grub
grub rescue> set root=(hd0,msdos2)

Then load the normal module and start it:

grub rescue> insmod normal
grub rescue> normal

This should boot the system as usual.

At this point, we can inspect the partitions again, and see that sda2 has properly replaced the former md0 partition:

# blkid
/dev/sda2: UUID="<md0_uuid>" TYPE="ext4"

Check that mdadm was properly removed:

# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0  100G  0 disk
├─sda1   8:1    0  953M  0 part /var/tmp
└─sda2   8:2    0 99.1G  0 part /
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>
# mdadm --detail /dev/md0
mdadm: cannot open /dev/md0: No such file or directory
# mdadm --examine /dev/sda2
mdadm: No md superblock detected on /dev/sda2.

Before rebooting, make sure to reinstall grub so as to point it to the right UUID and avoid hitting grub rescue again:

# grub-install /dev/sda

Reboot again. This time, it should boot without jump scare.
Take a shower to recover from our emotions 😌

Previous Post Next Post

Remove mdadm RAID1 while keeping data

Context

Lazy and easy

Harder but cleaner

Remove `mdadm` RAID1 while keeping data