TSF – Giải pháp IT toàn diện cho doanh nghiệp SMB | HCM

P13 - Setup HA Two Node Cluster Proxmox Ve - Node Failure Disaster Simulation

🚀 Proxmox VE P13 – How to Setup High Availability (2-Node Cluster + NAS) | Failover Test

High Availability (HA) is one of the most powerful features in Proxmox VE, allowing virtual machines to automatically restart on another node if a failure occurs.

In this tutorial, we guide you through the complete process of setting up Proxmox HA on a two-node cluster using NAS shared storage.

You will learn how to:

  • Configure HA in a 2-node Proxmox cluster

  • Use NAS (NFS) as shared storage

  • Set quorum votes correctly

  • Add VM to HA group

  • Simulate a real hardware failure

  • Test automatic failover behavior

By the end of this guide, you will have a fully functional HA environment capable of handling node failures automatically.


🧪 Lab Environment

 
PVE01: 192.168.11.200 (main) PVE02: 192.168.11.201 (backup) Join cluster: TSF NAS TSF: 192.168.11.30:5001 VM Windows10 on PVE01

I/ HA Configuration – 2 Nodes


Step 0: Mount storage NFS

Both nodes must use shared storage.

In this lab, we use:

  • NAS Synology with shared NFS folder

  • Storage mounted on both PVE01 and PVE02

Alternative shared storage options:

  • SMB

  • OneDrive

  • Other centralized storage

Video guide to mount NFS:
https://youtu.be/oXagwrTRzM8


Step 1: Move disk VM to storage NFS

Move the Windows10 VM disk from local storage to the mounted NFS storage.

This ensures the VM can run on either node.


Step 2: Set Vote for PVE02

Initially, both nodes have a vote number of 1 (Cluster Information).

In a 2-node cluster, quorum must be adjusted manually.
In clusters with 3 or more nodes, the system automatically handles votes.

Open the shell of pve01 and run:

 
ls /etc/pve

Backup and edit corosync configuration:

 
cd /etc/pve cp corosync.conf corosync.new.conf nano corosync.new.conf

Edit:

 
Config_version:3 quorum_votes:2 (PVE02)

Save the file:

Ctrl + O → Enter
Exit: Ctrl + X

Replace original file:

 
mv corosync.conf corosync.bak.conf mv corosync.new.conf corosync.conf

Check vote number after setup in Cluster Information.


Step 3: Create HA

Add VM resource:

 
ha-manager add vm:100

Add node HA (configure via GUI or group policy as needed).

Remove VM from HA (optional):

 
ha-manager remove vm:100

Restart HA service:

 
systemctl restart pve-ha-crm

At this stage, HA is enabled.


II/ Simulate PVE01 FAIL Disaster

Now we simulate a real hardware failure.

Demo STOP pve01 (simulate main node hardware failure).

Login to pve02 (backup) and monitor.

After approximately 3–5 minutes:

  • VM Windows10 automatically moves to PVE02

  • VM automatically starts

This demonstrates Proxmox HA failover capability.


Step 1: Handle Physical Server PVE01

After repairing hardware, reconnect and power on Server PVE01.

If HA node priority was configured:

  • PVE01 priority = 2

  • PVE02 lower priority

VM will automatically migrate back to PVE01 once it becomes available.

This behavior ensures workload runs on preferred primary node.


Step 2: Restart service system PVE01 (main)

Restart services:

 
pve-cluster pve-ha-crm pve-ha-lm

Ensure cluster and HA manager services are fully operational.


Step 3: Migrate VM Windows10 back to PVE01

If both nodes have equal priority (priority set 1):

Manual migration can be performed during low-traffic hours.

This ensures minimal service disruption.


Step 4: Restart service system PVE02 (backup)

Restart services:

 
pve-cluster pve-ha-crm pve-ha-lm

Both nodes should now be fully synchronized.


III/ If PVE01 Cannot Be Repaired – Replace with PVE03

In case the main node cannot be recovered:


Step 1: Shutdown all VMs of PVE02

Ensure no running workloads before cluster modification.


Step 2: Remove cluster group

Remove existing cluster configuration.


Step 3: Share NFS

On NAS:

  • Add IP permission for PVE03

  • Or configure PVE03 to use the same IP as original PVE01

Ensure NFS access is identical.


Step 4: On PVE03, add storage NFS NAS

Mount NFS storage on new node.


Step 5: Create cluster group for PVE02 and PVE03

Recreate cluster.

Create HA group.

Set number of votes.

Cluster is restored with new hardware.


🔐 Best Practices for 2-Node HA

✔ Always use shared storage
✔ Configure vote manually for 2-node cluster
✔ Set priority for primary node
✔ Test failover before production deployment
✔ Monitor HA status regularly

For production:

  • Consider adding QDevice for better quorum stability

  • Use dedicated network for corosync

  • Monitor HA logs for anomalies


🎯 Conclusion

Setting up Proxmox HA in a 2-node cluster with NAS storage provides enterprise-level resilience even in small infrastructure environments.

In this guide, you have:

  • Configured HA

  • Adjusted quorum votes

  • Added VM resource

  • Simulated real disaster failover

  • Tested recovery process

  • Rebuilt cluster after hardware replacement

Understanding HA failover behavior is critical for any system administrator managing virtualization infrastructure.

This tutorial not only teaches configuration steps but also demonstrates real-world disaster recovery scenarios.

See also related articles

P15 – Backup and Restore VM in Proxmox VE

P15 – Backup and Restore VM in Proxmox VE 🚀 Proxmox VE P15 – Backup and Restore VMs (Full Step-by-Step Guide) Data protection is one of the most critical responsibilities of any system administrator.In Proxmox VE, having a proper backup and restore strategy ensures your infrastructure can quickly recover from...

Read More

P14 – How to Remove Cluster Group Safely on Proxmox

Proxmox VE 9 P14: How to Remove Cluster Group Safely In Proxmox (Step-by-Step Guide) 🚀 Proxmox VE 9 – How to Remove Cluster Group (Step-by-Step) In some scenarios, you may need to remove a Proxmox cluster configuration completely, especially when: ❌ A node failed permanently ❌ The cluster was misconfigured...

Read More