TSF – Giải pháp IT toàn diện cho doanh nghiệp SMB | HCM

High Availability Test with ZFS Replication – Full Demo Proxmox 9

In this video, we demonstrate a full High Availability test on Proxmox PVE 9 using ZFS replication. Learn how to configure your Proxmox cluster to keep virtual machines online even if one node fails. This tutorial covers step-by-step setup, replication tasks, and failover scenarios. Understand how ZFS snapshots and replication ensure data consistency across nodes. Perfect for IT professionals, home lab enthusiasts, and anyone running critical VMs. Avoid downtime and secure your workloads with proven HA techniques. Watch closely to see real-time failover and recovery in action. Boost your Proxmox skills and build a resilient virtualization environment today.

1. Overview

🚀 What is Replication in Proxmox?
Replication = send incremental (partial synchronization) copies of VM from node A → node B on a schedule (5 minutes, 15 minutes, 1 hour, …).
It is only for:
• VM using Ceph RBD
• VM using ZFS (ZFS send/receive)
Does not support LVM-thin, ext4, directory.

📦 How does Replication work?
For example:
VM 100 is running on node pve01.
You create replication to pve02.
Mechanism:
• First time → copy all VM data to node pve02.
• Next times → only send changed blocks (incremental).
• At the destination node, VM is always in stopped state and only keeps snapshot replication.
• When failover → snapshot is “promoted” to the master → VM can run.

2. STANDARD PROCEDURE HA + REPLICATION PROXMOX

🔵 STEP 1 — Create Proxmox cluster (required)

pve01zfs: 192.168.11.200 (main)
pve02zfs: 192.168.11.201 (backup)

On the master node (pve01zfs):
pvecm create tsf

Get IP information pve01, paste into hosts file of pve02
192.168.11.200 pve01zfs.tsf.id.vn pve01zfs

pvecm add pve01zfs.tsf.id.vn (pve02 join cluster)
pass root pve01

For details, see video Setup Cluster Group on Proxmox Version 9
https://youtu.be/wUqA8xeLcjc
Note:
• 2 nodes should have separate corosync link, or stable LAN.
• Latency < 2ms.
• Same time
• Same Proxmox version.
________________________________________


🔵 STEP 2 — Prepare storage replicatable

VM must run on ZFS or Ceph if you want to use replication.
If you are using local-lvm → must move disk to ZFS:
qm move_disk 101 scsi0 zfs-storage
_______________________________________


🔵 STEP 3 — Create replication for VM

In GUI:
• Select VM → Replication → Add
• Target: pve02
• Schedule: */30 * * * * (30 minutes)
• Rate limit: Unlimited (or 100 MB/s)
Replication will automatically create snapshots on the target node.

First run = Schedule now (snapshot full VM so it will take time)
Second run = snapshot incremental
________________________________________


🔵 STEP 4 — Set the number of cluster votes (Option)

Note: If the cluster has 3 or more nodes, this step is not required.

Create new conf file
cd /etc/pve
cp corosync.conf corosync.new.conf

Edit file contents
nano corosync.new.conf

pve02 vote 2 (backup)

Backup conf file and rename file
mv corosync.conf corosync.bak.conf
mv corosync.new.conf corosync.conf

_______________________________________


🔵 STEP 5 — Add VM Resource

Add HA VM 100

ha-manager add vm:100

Add Node Affinity rules

Datacenter  HA  Affinity Rules => HA Node Affinity Rules tab  Add
HA Resource: select VM HA
Set Priority:
o pve01 = 2 (main higher priority is given more priority)
o pve02 = 1

_________________________________________


🔵 STEP 6 — Test HA failover (important)

➤ How to test:
STOP node pve01 completely ton
→ VM 100 will automatically run on pve02 (It takes a few minutes for the VM to start)

Suppose pve01 fixes the error and restarts, the VM will automatically jump back to pve01

🔵 STEP 7 — Restart Services Cluster