P13 - Setup HA Two Node Cluster Proxmox Ve - Node Failure Disaster Simulation

🚀 Proxmox VE P13 – How to Setup High Availability (2-Node Cluster + NAS) | Failover Test

High Availability (HA) is one of the most powerful features in Proxmox VE, allowing virtual machines to automatically restart on another node if a failure occurs.

In this tutorial, we guide you through the complete process of setting up Proxmox HA on a two-node cluster using NAS shared storage.

You will learn how to:

Configure HA in a 2-node Proxmox cluster
Use NAS (NFS) as shared storage
Set quorum votes correctly
Add VM to HA group
Simulate a real hardware failure
Test automatic failover behavior

By the end of this guide, you will have a fully functional HA environment capable of handling node failures automatically.

🧪 Lab Environment

I/ HA Configuration – 2 Nodes

Step 0: Mount storage NFS

Both nodes must use shared storage.

In this lab, we use:

NAS Synology with shared NFS folder
Storage mounted on both PVE01 and PVE02

Alternative shared storage options:

SMB
OneDrive
Other centralized storage

Video guide to mount NFS:
https://youtu.be/oXagwrTRzM8

Step 1: Move disk VM to storage NFS

Move the Windows10 VM disk from local storage to the mounted NFS storage.

This ensures the VM can run on either node.

Step 2: Set Vote for PVE02

Initially, both nodes have a vote number of 1 (Cluster Information).

In a 2-node cluster, quorum must be adjusted manually.
In clusters with 3 or more nodes, the system automatically handles votes.

Open the shell of pve01 and run:

Backup and edit corosync configuration:

Edit:

Save the file:

Ctrl + O → Enter
Exit: Ctrl + X

Replace original file:

Check vote number after setup in Cluster Information.

Step 3: Create HA

Add VM resource:

Add node HA (configure via GUI or group policy as needed).

Remove VM from HA (optional):

Restart HA service:

At this stage, HA is enabled.

II/ Simulate PVE01 FAIL Disaster

Now we simulate a real hardware failure.

Demo STOP pve01 (simulate main node hardware failure).

After approximately 3–5 minutes:

VM Windows10 automatically moves to PVE02
VM automatically starts

This demonstrates Proxmox HA failover capability.

Step 1: Handle Physical Server PVE01

After repairing hardware, reconnect and power on Server PVE01.

If HA node priority was configured:

PVE01 priority = 2
PVE02 lower priority

VM will automatically migrate back to PVE01 once it becomes available.

This behavior ensures workload runs on preferred primary node.

Step 2: Restart service system PVE01 (main)

Restart services:

Ensure cluster and HA manager services are fully operational.

Step 3: Migrate VM Windows10 back to PVE01

If both nodes have equal priority (priority set 1):

Manual migration can be performed during low-traffic hours.

This ensures minimal service disruption.

Step 4: Restart service system PVE02 (backup)

Restart services:

Both nodes should now be fully synchronized.

III/ If PVE01 Cannot Be Repaired – Replace with PVE03

In case the main node cannot be recovered:

Step 1: Shutdown all VMs of PVE02

Ensure no running workloads before cluster modification.

Step 2: Remove cluster group

Remove existing cluster configuration.

Step 3: Share NFS

On NAS:

Add IP permission for PVE03
Or configure PVE03 to use the same IP as original PVE01

Ensure NFS access is identical.

Step 4: On PVE03, add storage NFS NAS

Mount NFS storage on new node.

Step 5: Create cluster group for PVE02 and PVE03

Recreate cluster.

Create HA group.

Set number of votes.

Cluster is restored with new hardware.

🔐 Best Practices for 2-Node HA

✔ Always use shared storage
✔ Configure vote manually for 2-node cluster
✔ Set priority for primary node
✔ Test failover before production deployment
✔ Monitor HA status regularly

For production:

Consider adding QDevice for better quorum stability
Use dedicated network for corosync
Monitor HA logs for anomalies

🎯 Conclusion

Setting up Proxmox HA in a 2-node cluster with NAS storage provides enterprise-level resilience even in small infrastructure environments.

In this guide, you have:

Configured HA
Adjusted quorum votes
Added VM resource
Simulated real disaster failover
Tested recovery process
Rebuilt cluster after hardware replacement

Understanding HA failover behavior is critical for any system administrator managing virtualization infrastructure.

This tutorial not only teaches configuration steps but also demonstrates real-world disaster recovery scenarios.

P21 – How to Schedule Automatic Shutdown and Startup of VMs in Proxmox VE

P21 – How to Schedule Automatic Shutdown and Startup of VMs in Proxmox VE ⏰ Proxmox VE – How to Schedule Automatic VM Start and Shutdown Using Cron (Step-by-Step Guide) Automating virtual machine operations is an essential skill for every Proxmox administrator. In many real-world environments, you may need virtual...

P15 – Backup and Restore VM in Proxmox VE

P15 – Backup and Restore VM in Proxmox VE 🚀 Proxmox VE P15 – Backup and Restore VMs (Full Step-by-Step Guide) Data protection is one of the most critical responsibilities of any system administrator.In Proxmox VE, having a proper backup and restore strategy ensures your infrastructure can quickly recover from...

P14 – How to Remove Cluster Group Safely on Proxmox

Proxmox VE 9 P14: How to Remove Cluster Group Safely In Proxmox (Step-by-Step Guide) 🚀 Proxmox VE 9 – How to Remove Cluster Group (Step-by-Step) In some scenarios, you may need to remove a Proxmox cluster configuration completely, especially when: ❌ A node failed permanently ❌ The cluster was misconfigured...