TSF – Giải pháp IT toàn diện cho doanh nghiệp SMB | HCM

Setup HA Two Node Cluster Proxmox Ve - Node Failure Disaster Simulation

In this tutorial, we guide you through the complete process of setting up High Availability (HA) on a two-node Proxmox VE cluster. You’ll learn how to configure the cluster, enable HA groups, and ensure your virtual machines remain online even when one node fails. This step-by-step guide also includes a real disaster simulation to demonstrate how Proxmox automatically handles node failure and restores VM availability.
We explain every configuration detail clearly so you can build a resilient and reliable virtualization environment for your infrastructure. Whether you are an IT helpdesk technician or a system administrator, this tutorial helps you understand HA failover behavior in real-world situations. You’ll also get practical tips to optimize HA performance and prevent downtime.
By the end of this guide, you’ll have a fully functional Proxmox HA setup capable of handling node failures automatically.

Lab:

PVE01: 192.168.11.200 (main)
PVE02: 192.168.11.201 (backup)
Join cluster: TSF
NAS TSF: 192.168.11.30:5001
VM Windows10 on PVE01

I/ HA configuration 2 nodes

Step 0: Mount storage NFS

2 nodes with shared folder on NAS Synology. Or storage mount 2 nodes together like SMB/Onedrive/…
Video guide to mount NFS: https://youtu.be/oXagwrTRzM8

Step 1: Move disk VM to storage NFS

Step 2: Set Vote for PVE02

Initially, both nodes have a vote number of 1 (Cluster Information). In case of 3 or more pve nodes, there is no need to set the vote number, the system will automatically set it.

Open the shell of pve01 and run the command to see the list of corosync files

ls /etc/pve

Continue to run the backup file command and edit
cd /etc/pve
cp corosync.conf corosync.new.conf
nano corosync.new.conf

Edit:
Config_version:3
quorum_votes:2 (PVE02)
Save the file Ctrl + O, enter, exit Ctrl + X

mv corosync.conf corosync.bak.conf
mv corosync.new.conf corosync.conf

Check the number of votes after setup (Cluster Information)

Step 3: Create HA

Add VM Resource
ha-manager add vm:100

Add node HA

Remove VM HA (Option)
ha-manager remove vm:100

systemctl restart pve-ha-crm

II/ Simulate PVE01 FAIL disaster

Demo STOP pve01 (simulate main node hardware failure)

Login pve02 (backup) to check. After about 3-5 minutes. VM Windows10 has automatically jumped to node PVE02 and Started.

Step 1: Handle Physical Server PVE01


We brought Server PVE02 to maintain and repair the hardware and turned it on. Reconnect and start Server PVE01.
VM will automatically migrate when pve01 is online. The reason is that we set the HA node priority of pve01 to 2, which is higher than the priority of pve01.

Step 2: Restart service system PVE01 (main)

pve-cluster
pve-ha-crm
pve-ha-lm

Step 3: Migrate VM Windows10 back to PVE01.

With case: prioprity set 1 for two node.
You can choose a quiet time to perform the migration.

Step 4: Restart service system PVE02 (backup)

pve-cluster
pve-ha-crm
pve-ha-lm

III/ In case PVE01 cannot be completely repaired, it must be replaced (PVE03)

Step 1: Shutdown all VMs of PVE02

Step 2: Remove cluster group

Step 3: Share NFS

On NAS, add IP Permission NFS Share Folder for PVE03
Or set the IP of PVE03 to be the same as the original IP of PVE01

Step 4: On PVE03, add storage NFS NAS

Step 5: Create cluster group for PVE02 and PVE03


Create HA group
Set number of votes