P23 - High Availability Test with ZFS Replication Proxmox
Proxmox P23 – High Availability Test with ZFS Replication
Full HA + ZFS Replication Demo on Proxmox VE 9 (Step-by-Step Guide)
High Availability (HA) is a critical requirement in any production virtualization environment. In this tutorial, we perform a full High Availability test on Proxmox VE 9 using ZFS replication, demonstrating how to keep virtual machines online even when a node fails.
This guide walks you through the complete HA + Replication configuration process, including cluster setup, storage preparation, replication scheduling, HA resource configuration, and real-world failover testing.
If you are running business-critical workloads, home lab clusters, or enterprise Proxmox environments, mastering HA with ZFS replication is essential to avoid downtime and ensure data consistency.
1️⃣ Overview
🚀 What Is Replication in Proxmox?
Replication in Proxmox means sending incremental copies of a VM from one node to another on a scheduled basis (5 minutes, 15 minutes, 1 hour, etc.).
Replication is supported only for:
• VM using Ceph RBD
• VM using ZFS (ZFS send/receive)
It does NOT support:
• LVM-thin
• ext4
• directory storage
📦 How Replication Works
Example scenario:
VM 100 is running on node pve01.
You create replication to pve02.
Mechanism:
• First run → Full copy of VM data to node pve02
• Next runs → Only changed blocks are transferred (incremental replication)
• On the destination node, the VM remains stopped and only stores replicated snapshots
• During failover → Snapshot is promoted → VM becomes active and runs
This ensures near real-time synchronization while minimizing bandwidth usage.
2️⃣ STANDARD PROCEDURE HA + REPLICATION PROXMOX
🔵 STEP 1 — Create Proxmox Cluster (Required)
Infrastructure:
pve01zfs: 192.168.11.200 (main)
pve02zfs: 192.168.11.201 (backup)
On master node (pve01zfs):
pvecm create tsf
Get IP info from pve01 and paste into hosts file of pve02:
192.168.11.200 pve01zfs.tsf.id.vn pve01zfs
Join cluster from pve02:
pvecm add pve01zfs.tsf.id.vn
Password: root of pve01
For full cluster setup details, see video:
Setup Cluster Group on Proxmox Version 9
https://youtu.be/wUqA8xeLcjc
Important Notes:
• Two nodes should have a separate corosync link or stable LAN
• Latency < 2ms
• Same system time
• Same Proxmox version
Cluster stability is mandatory before configuring HA.
🔵 STEP 2 — Prepare Replication Storage
VM must run on ZFS or Ceph to support replication.
If using local-lvm, move disk to ZFS:
qm move_disk 101 scsi0 zfs-storage
Only ZFS and Ceph support native replication in Proxmox VE 9.
🔵 STEP 3 — Create Replication Task for VM
Using GUI:
• Select VM → Replication → Add
• Target: pve02
• Schedule: */30 * * * * (every 30 minutes)
• Rate limit: Unlimited (or 100 MB/s)
Replication automatically creates snapshots on the target node.
First run:
Schedule immediately → Full VM snapshot (takes time)
Second run onward:
Incremental snapshot replication
This ensures consistent ZFS-based synchronization.
🔵 STEP 4 — Set Cluster Votes (Optional)
Note: Required only when cluster has fewer than 3 nodes.
Create new configuration file:
cd /etc/pve
cp corosync.conf corosync.new.conf
Edit:
nano corosync.new.conf
Modify vote:
pve02 vote 2 (backup)
Backup old file and rename:
mv corosync.conf corosync.bak.conf
mv corosync.new.conf corosync.conf
This prevents split-brain scenarios in 2-node clusters.
🔵 STEP 5 — Add VM to HA Manager
Add VM 100 as HA resource:
ha-manager add vm:100
Configure Node Affinity Rules:
Datacenter → HA → Affinity Rules → HA Node Affinity Rules → Add
Select HA Resource: VM 100
Set priority:
• pve01 = 2 (higher priority, main node)
• pve02 = 1
This ensures VM prefers running on the primary node.
🔵 STEP 6 — Test HA Failover (Critical Test)
How to test:
Completely STOP node pve01
Result:
→ VM 100 automatically starts on pve02
→ It may take a few minutes
When pve01 is repaired and restarted:
→ VM automatically migrates back to pve01
This confirms HA + replication is functioning properly.
🔵 STEP 7 — Restart Cluster Services
After configuration changes, restart cluster services if required to ensure stability.
🔐 Why HA + ZFS Replication Matters
Implementing High Availability with ZFS replication in Proxmox provides:
• Reduced downtime
• Automatic failover
• Data consistency via ZFS snapshots
• Efficient incremental replication
• Production-ready resilience
This setup is ideal for:
Enterprise virtualization clusters
Home lab HA testing
Critical service hosting
Database or file server protection
🎯 Final Thoughts
Proxmox VE 9 combined with ZFS replication delivers a powerful, cost-effective High Availability solution. By properly configuring cluster quorum, replication tasks, and HA resource priorities, you can build a resilient virtualization infrastructure capable of surviving node failures.
Understanding the mechanics of incremental ZFS replication and HA failover gives you full control over uptime, storage efficiency, and disaster recovery readiness.
Mastering this configuration significantly elevates your Proxmox expertise and prepares you for real-world production environments.
See also related articles
P21 – How to Schedule Automatic Shutdown and Startup of VMs in Proxmox VE
P21 – How to Schedule Automatic Shutdown and Startup of VMs in Proxmox VE ⏰ Proxmox VE – How to Schedule Automatic VM Start and Shutdown Using Cron (Step-by-Step Guide) Automating virtual machine operations is an essential skill for every Proxmox administrator. In many real-world environments, you may need virtual...
Read MoreP15 – Backup and Restore VM in Proxmox VE
P15 – Backup and Restore VM in Proxmox VE 🚀 Proxmox VE P15 – Backup and Restore VMs (Full Step-by-Step Guide) Data protection is one of the most critical responsibilities of any system administrator.In Proxmox VE, having a proper backup and restore strategy ensures your infrastructure can quickly recover from...
Read MoreP14 – How to Remove Cluster Group Safely on Proxmox
Proxmox VE 9 P14: How to Remove Cluster Group Safely In Proxmox (Step-by-Step Guide) 🚀 Proxmox VE 9 – How to Remove Cluster Group (Step-by-Step) In some scenarios, you may need to remove a Proxmox cluster configuration completely, especially when: ❌ A node failed permanently ❌ The cluster was misconfigured...
Read More