Setup HA Two Node Cluster Proxmox Ve - Node Failure Disaster Simulation
In this tutorial, we guide you through the complete process of setting up High Availability (HA) on a two-node Proxmox VE cluster. You’ll learn how to configure the cluster, enable HA groups, and ensure your virtual machines remain online even when one node fails. This step-by-step guide also includes a real disaster simulation to demonstrate how Proxmox automatically handles node failure and restores VM availability.
We explain every configuration detail clearly so you can build a resilient and reliable virtualization environment for your infrastructure. Whether you are an IT helpdesk technician or a system administrator, this tutorial helps you understand HA failover behavior in real-world situations. You’ll also get practical tips to optimize HA performance and prevent downtime.
By the end of this guide, you’ll have a fully functional Proxmox HA setup capable of handling node failures automatically.
Lab:
PVE01: 192.168.11.200 (main)
PVE02: 192.168.11.201 (backup)
Join cluster: TSF
NAS TSF: 192.168.11.30:5001
VM Windows10 on PVE01
I/ HA configuration 2 nodes
Step 0: Mount storage NFS
2 nodes with shared folder on NAS Synology. Or storage mount 2 nodes together like SMB/Onedrive/…
Video guide to mount NFS: https://youtu.be/oXagwrTRzM8
Step 1: Move disk VM to storage NFS
Step 2: Set Vote for PVE02
Initially, both nodes have a vote number of 1 (Cluster Information). In case of 3 or more pve nodes, there is no need to set the vote number, the system will automatically set it.
Open the shell of pve01 and run the command to see the list of corosync files
ls /etc/pve
Continue to run the backup file command and edit
cd /etc/pve
cp corosync.conf corosync.new.conf
nano corosync.new.conf
Edit:
Config_version:3
quorum_votes:2 (PVE02)
Save the file Ctrl + O, enter, exit Ctrl + X
mv corosync.conf corosync.bak.conf
mv corosync.new.conf corosync.conf
Check the number of votes after setup (Cluster Information)
Step 3: Create HA
Add VM Resource
ha-manager add vm:100
Add node HA
Remove VM HA (Option)
ha-manager remove vm:100
systemctl restart pve-ha-crm
II/ Simulate PVE01 FAIL disaster
Demo STOP pve01 (simulate main node hardware failure)
Login pve02 (backup) to check. After about 3-5 minutes. VM Windows10 has automatically jumped to node PVE02 and Started.
Step 1: Handle Physical Server PVE01
We brought Server PVE02 to maintain and repair the hardware and turned it on. Reconnect and start Server PVE01.
VM will automatically migrate when pve01 is online. The reason is that we set the HA node priority of pve01 to 2, which is higher than the priority of pve01.
Step 2: Restart service system PVE01 (main)
pve-cluster
pve-ha-crm
pve-ha-lm
Step 3: Migrate VM Windows10 back to PVE01.
With case: prioprity set 1 for two node.
You can choose a quiet time to perform the migration.
Step 4: Restart service system PVE02 (backup)
pve-cluster
pve-ha-crm
pve-ha-lm
III/ In case PVE01 cannot be completely repaired, it must be replaced (PVE03)
Step 1: Shutdown all VMs of PVE02
Step 2: Remove cluster group
Step 3: Share NFS
On NAS, add IP Permission NFS Share Folder for PVE03
Or set the IP of PVE03 to be the same as the original IP of PVE01
Step 4: On PVE03, add storage NFS NAS
Step 5: Create cluster group for PVE02 and PVE03
Create HA group
Set number of votes