P17 - How to Setup ZFS RAID on Proxmox and Replace Failed Disk
🚀 Proxmox VE P17 – Setup ZFS RAID and Replace Failed Disk (Full Step-by-Step Guide)
Reliable storage is the foundation of every virtualization platform.
In this tutorial, you will learn how to set up ZFS RAID on Proxmox VE 9 and safely replace a failed disk without losing data. This guide covers both the initial RAID configuration and the complete disk replacement procedure, including resilvering and bootloader recovery.
By following this tutorial, IT administrators and homelab engineers can build a resilient Proxmox storage infrastructure that protects virtual machines and containers from disk failure.
In this guide, you will learn how to:
🧱 Configure ZFS RAID during Proxmox installation
💾 Monitor ZFS pool health using
zpool statusandzfs list❌ Identify and offline a failed disk
🔄 Replace the failed disk safely
⚙ Rebuild and restore bootloader using Proxmox Boot Tool
By the end, you will confidently manage ZFS RAID and protect your virtual workloads.
🧪 LAB Environment
Server host Proxmox has 2 or more disks.
Disk 1, 2:
Run RAID containing Proxmox OS
Store VMs, Backup files, ISO images
Demo partition type: GPT
🧱 Step 1 – Set Serial for DISK VM
To simulate RAID accurately, we assign different serial numbers to each virtual disk.
Edit VM config:
nano /etc/pve/qemu-server/105.conf
Add serial lines:
serial=AbcxyzDSK001
serial=AbcxyzDSK002
This ensures each disk has a unique identifier, similar to real physical drives.
💿 Step 2 – Install OS with ZFS RAID 1
Important:
GPT partition scheme
2 Disks
ZFS RAID 1
During Proxmox installation, choose:
ZFS (RAID1)
This creates:
BIOS Boot partition
EFI partition
ZFS root partition
🖥 Step 3 – Setup VM on Host and Run
Deploy and test VMs to confirm ZFS pool is operating normally.
❌ II – Simulate Disk Failure
Step 1 – Identify the Faulty Hard Drive
Check pool status:
zpool status
Example faulty disk info:
ID: 14912614961185646598
Name: scsi-0QEMU_QEMU_HARDDISK_AbcxyzDSK001-part3
Offline failed disk:
zpool offline rpool scsi-0QEMU_QEMU_HARDDISK_AbcxyzDSK001-part3
🔧 Step 2 – Shutdown Host and Replace Faulty Disk
If server does NOT support hot-swap:
Shutdown Proxmox
Remove faulty disk
Insert new disk in same slot
If server supports hot-swap:
Replace disk without shutdown
In this demo (VM simulation), add new disk and assign serial:
nano /etc/pve/qemu-server/105.conf
Add:
serial=AbcxyzDSK003
⚠ Important Note:
If Proxmox was installed using MBR (SeaBIOS):
System may fail boot
Boot using remaining disk
Backup all VMs/Data to external storage (NFS/SMB/OneDrive/Physical Disk)
Shutdown host
Attach new disk
Reinstall Proxmox with 2 disks
Mount storage and restore VMs
🔍 Step 3 – Identify the New Drive
After mounting:
ls -l /dev/disk/by-id/
Example:
New name:
scsi-0QEMU_QEMU_HARDDISK_AbcxyzDSK003
New drive: sda
Current drive: sdb
📋 Step 4 – Copy Partition from Old Disk to New Disk
On GPT:
Partition 1 → BIOS Boot (1 MiB)
Partition 2 → EFI (FAT32, 512 MiB)
Partition 3 → ZFS root
Check partition layout:
lsblk /dev/sdb
lsblk /dev/sda
Copy partition table:
sgdisk --replicate=/dev/sda /dev/sdb
This creates:
sda1
sda2
sda3
⚠ Do NOT copy ZFS data from sdb3 to sda3.
ZFS will rebuild automatically.
Copy small partitions if needed:
dd if=/dev/sdb1 of=/dev/sda1 bs=1M status=progress
dd if=/dev/sdb2 of=/dev/sda2 bs=1M status=progress
Do NOT copy sdb3.
🔄 Step 5 – Replace Faulty Disk in Mirror
Assume faulty disk:
/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_AbcxyzDSK001-part3
Replace disk:
zpool replace rpool <disk old> <disk new>
Example:
zpool replace rpool scsi-0QEMU_QEMU_HARDDISK_AbcxyzDSK001-part3 scsi-0QEMU_QEMU_HARDDISK_AbcxyzDSK003-part3
Check resilver progress:
zpool status -v
Wait until:
ONLINE status
scan resilvered 100%
rpool mirror healthy
🗂 Step 6 – Mount Root Dataset for Chroot
Mount datasets:
zfs mount -a
Bind root:
mkdir -p /mnt
mount --bind / /mnt
🧩 Step 7 – Mount EFI and Bind Filesystems
Mount EFI:
mkdir -p /mnt/boot/efi
mount /dev/sda2 /mnt/boot/efi
Bind system directories:
mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys
mount --bind /run /mnt/run
⚙ Step 8 – Chroot and Install Bootloader
Enter chroot:
chroot /mnt /bin/bash
Format EFI properly:
proxmox-boot-tool format /dev/sda
Refresh bootloader:
proxmox-boot-tool refresh
Verify EFI files:
ls /boot/efi/EFI/proxmox
If .efi files are present → bootloader is ready.
Reboot:
reboot
📊 Monitoring ZFS Health
Regularly check:
zpool status
zfs list
Monitor:
Disk state
Resilver progress
Pool health
Healthy ZFS pool should always display:
ONLINE
🔐 Production Best Practices
✔ Use enterprise-grade disks
✔ Monitor SMART regularly
✔ Keep spare disk available
✔ Test disk replacement procedure
✔ Use RAID 1 or RAID 10 for critical workloads
✔ Monitor ZFS resilver events
ZFS provides self-healing and data integrity verification — but proactive monitoring is still essential.
🎯 Conclusion
In this Proxmox VE P17 guide, you have successfully:
Installed Proxmox with ZFS RAID 1
Simulated disk failure
Identified and offlined faulty disk
Replaced disk safely
Rebuilt ZFS mirror
Restored bootloader using Proxmox Boot Tool
Your Proxmox storage is now resilient, redundant, and production-ready.
ZFS RAID + Proper Disk Replacement Procedure = Enterprise-Level Data Protection.
See also related articles
P21 – How to Schedule Automatic Shutdown and Startup of VMs in Proxmox VE
P21 – How to Schedule Automatic Shutdown and Startup of VMs in Proxmox VE ⏰ Proxmox VE – How to Schedule Automatic VM Start and Shutdown Using Cron (Step-by-Step Guide) Automating virtual machine operations is an essential skill for every Proxmox administrator. In many real-world environments, you may need virtual...
Read MoreP15 – Backup and Restore VM in Proxmox VE
P15 – Backup and Restore VM in Proxmox VE 🚀 Proxmox VE P15 – Backup and Restore VMs (Full Step-by-Step Guide) Data protection is one of the most critical responsibilities of any system administrator.In Proxmox VE, having a proper backup and restore strategy ensures your infrastructure can quickly recover from...
Read MoreP14 – How to Remove Cluster Group Safely on Proxmox
Proxmox VE 9 P14: How to Remove Cluster Group Safely In Proxmox (Step-by-Step Guide) 🚀 Proxmox VE 9 – How to Remove Cluster Group (Step-by-Step) In some scenarios, you may need to remove a Proxmox cluster configuration completely, especially when: ❌ A node failed permanently ❌ The cluster was misconfigured...
Read More