all projects
// lab · storage

LVM + RAID1 mirroring lab · hot failover

End-to-end lab on Ubuntu Server 24.04 in VMware: GPT + PV/VG/LV prep on two disks, linear LV converted into raid1 with lvconvert, hot-remove of the primary disk straight from the hypervisor and SHA-256 integrity check from the surviving leg. No sysfs fakery — the disk actually disappears.

Statuscompleted
Year2025
Rolelab · hands-on
EnvironmentVMware Workstation · Ubuntu 24.04
Ubuntu 24.04 LVM RAID1 VMware ext4

context

I wanted to understand LVM RAID1 at the command level, not the slide level. What really happens when you convert a linear LV into a mirror? What do you read in lvs during a resync? How does the kernel behave when a disk in the mirror is physically removed — not offlined through sysfs, actually detached from the VM?

The goal was to cover the full life-cycle of an LVM-managed raid1 volume: creation → conversion → real failure → integrity check. Every step with actual commands, every state verified in lvs.

approach · phase 1 — /dev/sdb prep

Ubuntu Server 24.04 on VMware Workstation, two virtual disks hot-added. The first (/dev/sdb) prepared with GPT + LVM partition:

  • wipefs -a /dev/sdb to clear leftover signatures (FS, RAID, LVM).
  • parted -s /dev/sdb mklabel gpt, then mkpart primary 1MiB 100%, then set 1 lvm on.
  • partprobe /dev/sdb && udevadm settle to force the kernel to pick up /dev/sdb1.
  • pvcreate /dev/sdb1vgcreate vg_mirror /dev/sdb1lvcreate -L 9G -n lv_mirror vg_mirror.
  • Verify with lvs -a -o lv_name,vg_name,lv_attr,lv_size,segtype,devices vg_mirror: segtype=linear on /dev/sdb1.

approach · phase 2 — filesystem and test data

  • mkfs.ext4 -L MIRROR_LV /dev/vg_mirror/lv_mirror, mount at /mnt/mirror.
  • Persistent mount via UUID in /etc/fstab (not via /dev/..., for robustness across device renaming): blkid -s UUID -o value → line with options defaults,noatime,x-systemd.device-timeout=5s 0 2.
  • Real 6 GiB file generated: dd if=/dev/urandom of=/mnt/mirror/bigfile.bin bs=1M count=6144 status=progress. urandom forces actual writes (no sparse files).
  • Reference hash saved on /root (on the OS disk, not on the LV): sha256sum /mnt/mirror/bigfile.bin | tee /root/bigfile.sha256.

approach · phase 3 — second disk and VG extension

  • Hot-add of /dev/sdc from the hypervisor. If the kernel doesn't pick it up, SCSI rescan: for h in /sys/class/scsi_host/host*; do echo "- - -" | tee "$h/scan"; done.
  • Same cycle as /dev/sdb: wipefs, GPT, LVM partition, pvcreate /dev/sdc1.
  • vgextend vg_mirror /dev/sdc1vgs confirms pv_count=2.

approach · phase 4 — conversion to RAID1

  • Command: lvconvert -y --type raid1 -m1 vg_mirror/lv_mirror /dev/sdb1 /dev/sdc1. Internally LVM creates rimage segments (data) and rmeta (metadata) for each copy.
  • Resync monitoring: lvs -a -o lv_name,segtype,devices,copy_percent,raid_sync_action vg_mirror. Wait until copy_percent=100.00 and raid_sync_action=idle.
  • Integrity check before the failure: sha256sum -c /root/bigfile.sha256OK.

approach · phase 5 — real failure + verification

  • Real hot-remove of /dev/sdb from the VM via VMware. No echo offline > /sys/block/sdb/device/state: the disk actually disappears, and the kernel gets the detach event.
  • Guest-side: dmesg -T | tail -n 120 shows expected I/O errors; lvs flags the LV as degraded but still active, serving data from /dev/sdc1 alone.
  • /mnt/mirror stays mounted and navigable. sha256sum -c /root/bigfile.sha256 reads the whole file from the surviving leg and confirms bigfile.bin: OK.

outcome

  • Full cycle verified: create → mirror → resync → hot-remove → degraded read → integrity check.
  • Zero SHA-256 discrepancies between pre-failure and post-failure: the mirror does its job.
  • copy_percent observably moved 0 → 100, raid_sync_action from resync to idle.
  • UUID-based persistent mount validated with umount && mount -a — no boot-time errors under device-name reshuffles.

constraints

VMware Workstation handles hot-remove of virtual disks cleanly — the disk actually disappears and the kernel receives the event. Less realistic than a physical failure, though, is the timing of the guest-side I/O errors.

If lvconvert --type raid1 fails with messages about space for RAID metadata, the fastest fix is to slightly shrink the LV (9G → 8G) to leave free extents for the mirror structure. LVM doesn't warn you in advance.

Do not reboot the VM during initial resync: the mirror is in a transient state and a reboot forces LVM to restart the sync from scratch. Wait for idle before any maintenance operation.

lessons

  • -m1 in lvconvert means one additional copy (2 total), not one copy total. A classic early-testing mistake.
  • LVM RAID1 is not md-RAID in disguise: the user interface is entirely in lvs (raid_sync_action, copy_percent, raid_mismatch_count), not in /proc/mdstat. Internally it uses dm-raid, but it doesn't expose it.
  • UUID-based mounts in fstab are mandatory when working with mirrors / failover: /dev/sdX names change depending on boot-time detection order.
  • A SHA-256 check before and after the event is the simplest way to make the test unarguable. If it reads OK from a single leg, the mirror worked — no assumptions needed.
  • VMware hot-added disks aren't always picked up immediately by the kernel: the SCSI rescan is the command to have in muscle memory.
matrix-mode · ON