TSF – Giải pháp IT toàn diện cho doanh nghiệp SMB | HCM

P18 - How to Create RAID with mdadm Proxmox and Replace a Failed Disk

🚀 Proxmox VE P18 – Create RAID with mdadm and Replace a Failed Disk (Full Step-by-Step Guide)

Proxmox VE provides powerful tools for building reliable and scalable storage for virtual machines. While ZFS is a popular option, traditional Linux Software RAID using mdadm remains a flexible and lightweight alternative for many environments.

In this step-by-step tutorial, you will learn how to:

  • 🧱 Create a RAID array using mdadm on Proxmox

  • 💾 Configure RAID 1 for disk redundancy

  • 🔍 Monitor RAID health and status

  • ❌ Detect and remove a failed disk

  • 🔄 Replace the disk without affecting virtual workloads

  • ✅ Verify rebuild and confirm RAID stability

Whether you are building a homelab or managing a production server, understanding mdadm RAID is essential for maintaining system stability and protecting critical data.

By the end of this guide, your Proxmox storage will be more resilient and disaster-ready.


🧪 1.1 Lab Environment

Server host Proxmox has 3 or more disks.

  • Disk 1: Proxmox OS

  • Disk 2, Disk 3, …: Storage for VM disks, Backup files, ISO files

This setup separates system disk from storage RAID for better flexibility.


⚙ 1.2 Preparation

Install Proxmox with 1 disk first.

After OS installation, add 2 additional disks to run RAID 1.
(You can add more disks for RAID 5 or RAID 10.)

Edit VM configuration:

 
nano /etc/pve/qemu-server/101.conf

Add:

 
serial=VM105DISK01

🧩 1.3 Set Serial for DISK

Real hard drives have unique serial numbers. To simulate accurately in a VM lab, assign different serials.

Edit configuration:

 
nano /etc/pve/qemu-server/101.conf

Add:

 
serial=VM105DISK02 serial=VM105DISK03

This ensures proper disk identification during RAID management.


🚀 1.4 Start RAID Setup

Install mdadm

 
apt update apt install mdadm -y

Format Drives (If Needed – CAUTION)

Check disks:

 
lsblk

Wipe filesystem signatures:

 
wipefs -a /dev/sdb wipefs -a /dev/sdc

⚠ Make sure these disks do not contain important data.


Create RAID 1 Array (md0)

 
mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdb /dev/sdc

Verify RAID Status

 
cat /proc/mdstat

You should see the array initializing or syncing.


Create File System (ext4)

 
mkfs.ext4 /dev/md0

Create Mount Directory

 
mkdir /mnt/raid_data

Mount RAID

 
mount /dev/md0 /mnt/raid_data

Enable Auto-Mount After Reboot

 
echo '/dev/md0 /mnt/raid_data ext4 defaults,nofail 0 2' >> /etc/fstab

Save RAID Configuration

 
mdadm --detail --scan >> /etc/mdadm/mdadm.conf

Add Storage RAID in Proxmox

After mounting:

  • Add storage directory in Proxmox

  • Move VM disks to RAID storage

  • Start VM to confirm everything works

Your RAID 1 storage is now active.


❌ 1.5 Simulate a Disk Failure

To understand RAID recovery, we simulate disk failure on /dev/sdb.


Step 1 – Identify the Faulty Hard Drive

Check RAID details:

 
mdadm --detail /dev/md0 cat /proc/mdstat

Check disk ID and serial:

 
ls -l /dev/disk/by-id/

Step 2 – Remove Faulty Drive from RAID

 
mdadm --manage /dev/md0 --remove /dev/sdb

If the RAID drive still has cache, the command will still work.

The array will now run in degraded mode.


Step 3 – Attach a New Drive

Edit VM configuration:

 
nano /etc/pve/qemu-server/101.conf

Add:

 
serial=VM105DISK05

Assume new disk is recognized as:

 
/dev/sdb

Verify:

 
lsblk

Step 4 – Add New Drive to RAID

 
mdadm --add /dev/md0 /dev/sdb

Check rebuild progress:

 
watch cat /proc/mdstat

You will see RAID rebuilding automatically.


Step 5 – Verify RAID After Rebuild

After rebuild completes:

 
mdadm --detail /dev/md0

Expected result:

  • 2 active drives

  • Status shows [UU]

  • RAID state returns to “clean”

This confirms redundancy has been restored.


🔄 Update RAID Configuration

Save updated configuration:

 
mdadm --detail --scan >> /etc/mdadm/mdadm.conf update-initramfs -u

This ensures RAID mounts correctly after reboot.


📊 Monitoring RAID Health

Regularly check:

 
cat /proc/mdstat mdadm --detail /dev/md0

Monitor:

  • Rebuild progress

  • Disk status

  • Degraded mode alerts

Early detection prevents data loss.


🔐 Production Best Practices

✔ Always separate OS disk from RAID storage
✔ Use identical disks for RAID arrays
✔ Keep a spare disk ready
✔ Monitor SMART health regularly
✔ Test rebuild process in lab
✔ Backup critical data even with RAID

Remember: RAID is not a backup — it protects against hardware failure, not accidental deletion.


🎯 Conclusion

In this Proxmox VE P18 guide, you have successfully:

  • Installed and configured mdadm

  • Created RAID 1 array

  • Mounted and configured auto-mount

  • Simulated disk failure

  • Removed faulty disk

  • Added replacement disk

  • Rebuilt and verified RAID integrity

Your Proxmox storage is now redundant, resilient, and production-ready using Linux Software RAID.

Mastering both ZFS RAID and mdadm RAID gives you flexibility in designing Proxmox storage architectures for different scenarios.

Storage reliability is the backbone of virtualization.
Proper RAID configuration ensures uptime and protects your virtual workloads.

See also related articles

P15 – Backup and Restore VM in Proxmox VE

P15 – Backup and Restore VM in Proxmox VE 🚀 Proxmox VE P15 – Backup and Restore VMs (Full Step-by-Step Guide) Data protection is one of the most critical responsibilities of any system administrator.In Proxmox VE, having a proper backup and restore strategy ensures your infrastructure can quickly recover from...

Read More

P14 – How to Remove Cluster Group Safely on Proxmox

Proxmox VE 9 P14: How to Remove Cluster Group Safely In Proxmox (Step-by-Step Guide) 🚀 Proxmox VE 9 – How to Remove Cluster Group (Step-by-Step) In some scenarios, you may need to remove a Proxmox cluster configuration completely, especially when: ❌ A node failed permanently ❌ The cluster was misconfigured...

Read More